Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Very slow load of ONNX model in Windows #22219

Open
dhatraknilam opened this issue Sep 25, 2024 · 1 comment
Open

[Performance] Very slow load of ONNX model in Windows #22219

dhatraknilam opened this issue Sep 25, 2024 · 1 comment
Labels
performance issues related to performance regressions platform:windows issues related to the Windows platform

Comments

@dhatraknilam
Copy link

dhatraknilam commented Sep 25, 2024

Describe the issue

I am trying to load XGBoost onnx models using onnxruntime on Windows machine.
The model size is 52 mb and the RAM it is consuming on loading is 1378.9 MB. The time to load the model is 15 mins!!
This behavior is observed only on Windows, in Linux the models are loaded in few seconds. but the memory consumption is high in Linux as well.

I tried solution suggested in [https://github.com//issues/3802#issuecomment-624464802] but getting this error
AttributeError: 'onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions' object attribute 'graph_optimization_level' is read-only

This is the simple code I used to load the model,
# sess = rt.InferenceSession(modelSav_path, providers=["CPUExecutionProvider"])

To reproduce

Train and a XGBoost classification model following params:
`

Classifier

update_registered_converter(
XGBClassifier,
"XGBoostXGBClassifier",
calculate_linear_classifier_output_shapes,
convert_xgboost,
options={"nocl": [True, False], "zipmap": [True, False, "columns"]},
)

param = {'n_estimators': 3435, 'max_delta_step': 6, 'learning_rate': 0.030567232354470994, 'base_score': 0.700889637773676, 'scale_pos_weight': 0.29833333651319716, 'booster': 'gbtree', 'reg_lambda': 0.0005531812782988272, 'reg_alpha': 4.8213852607021606e-05, 'subsample': 0.9816268623744107, 'colsample_bytree': 0.3187040821569215, 'max_depth': 17, 'min_child_weight': 2, 'eta': 6.2582977222245746e-06, 'gamma': 2.2248460288603035e-07, 'grow_policy': 'depthwise'}

x_train.columns = range(x_train.shape[1])
x_test.columns = range(x_train.shape[1])

pipe = Pipeline([("xgb", MultiOutputClassifier(XGBClassifier(**param)))])
pipe.fit(x_train.to_numpy(), y_train)
model_onnx = convert_sklearn(
pipe,
"pipeline_xgboost",
[("input", FloatTensorType([None, x_train.shape[1]]))],
verbose=1,
target_opset={"": 12, "ai.onnx.ml": 2},
)

with open("modelname.onnx", "wb") as f:
f.write(model_onnx.SerializeToString())
`

Train and a XGBoost regressor model following params:
`

Regressor

update_registered_converter(
XGBRegressor,
"XGBoostXGBRegressor",
calculate_linear_regressor_output_shapes,
convert_xgboost,

)

param = {'n_estimators': 3435, 'max_delta_step': 6, 'learning_rate': 0.030567232354470994, 'base_score': 0.700889637773676, 'scale_pos_weight': 0.29833333651319716, 'booster': 'gbtree', 'reg_lambda': 0.0005531812782988272, 'reg_alpha': 4.8213852607021606e-05, 'subsample': 0.9816268623744107, 'colsample_bytree': 0.3187040821569215, 'max_depth': 17, 'min_child_weight': 2, 'eta': 6.2582977222245746e-06, 'gamma': 2.2248460288603035e-07, 'grow_policy': 'depthwise'}

x_train.columns = range(x_train.shape[1])
x_test.columns = range(x_train.shape[1])

pipe = Pipeline([("xgb", MultiOutputRegressor(XGBRegressor(**param)))])
pipe.fit(x_train.to_numpy(), y_train)

model_onnx = convert_sklearn(
pipe,
"pipeline_xgboost",
[("input", FloatTensorType([None, x_train.shape[1]]))],
verbose=1,
target_opset={"": 12, "ai.onnx.ml": 2},
options={type(pipe):{'zipmap':False}}
)

with open("modelname.onnx", "wb") as f:
f.write(model_onnx.SerializeToString())`

Load the model with following code,
sess = rt.InferenceSession(modelSav_path, providers=["CPUExecutionProvider"])
And observe the load time and RAM usage.

Urgency

This is release critical issue, since we can't deliver these models with such low performance. Although the models are performing well, we are stuck with the loading time issue. We also thought to use other libraries to package the ML models but we don't have necessary compliance also we trust Microsoft.

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

@dhatraknilam dhatraknilam added the performance issues related to performance regressions label Sep 25, 2024
@github-actions github-actions bot added the platform:windows issues related to the Windows platform label Sep 25, 2024
@dhatraknilam dhatraknilam changed the title [Performance] Very slow load of ONNX model in memory in Windows [Performance] Very slow load of ONNX model in Windows Sep 25, 2024
@xadupre
Copy link
Member

xadupre commented Sep 27, 2024

This PR should solve this: #22043.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues related to performance regressions platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

2 participants