Segmentation fault when trying to get feature importance of multilabel binary classifier #10686

shreyaspuducheri23 · 2024-08-09T03:08:37Z

Operating System: linux
Python Version: 3.10.14
XGBoost Version: 2.1.0

I am experiencing a segmentation fault with XGBoost 2.1.0 when trying to access feature importances in a multi-label binary classification model. The model trains and predicts as expected; however, when I attempt to retrieve feature importances using either xgb_model.feature_importances_ or xgb_model.get_score(importance_type='weight'), the process fails. In a Jupyter kernel, this results in a kernel crash, and when executed from the terminal, it outputs "Segmentation fault". The issue occurs specifically under these conditions, without any problems during other operations like fitting or predicting.

The text was updated successfully, but these errors were encountered:

trivialfis · 2024-08-10T08:53:20Z

Thank you for sharing! Will try to reproduce it.

trivialfis · 2024-08-12T17:48:24Z

Hi @shreyaspuducheri23 , could you please share a reproducible example? I tried to following toy example and did not observe a segfault:

from sklearn.datasets import make_multilabel_classification
import xgboost as xgb


X, y = make_multilabel_classification()
clf = xgb.XGBClassifier()
clf.fit(X, y)
clf.feature_importances_
clf.get_booster().get_score(importance_type='weight')

shreyaspuducheri23 · 2024-08-12T18:17:10Z

Hi @trivialfis the issue arrises when using the vector leaf option:

X, y = make_multilabel_classification(n_classes=2, n_labels=2,
                                      allow_unlabeled=False,
                                      random_state=1)

clf = xgb.XGBClassifier(multi_strategy='multi_output_tree')
clf.fit(X, y)
clf.feature_importances_
clf.get_booster().get_score(importance_type='weight')

trivialfis · 2024-08-12T18:27:07Z

Ah, the parameter is still working in progress. Will implement feature importance after sorting out some current work.

shreyaspuducheri23 · 2024-08-12T19:48:41Z

I see, thank you! Do you have an estimated time frame- i.e. weeks, months, etc.? Just wondering whether it would be in my best interest to wait for the feature or just switch to one-output-per-tree for my current project.

trivialfis · 2024-08-13T06:28:04Z

Opened a PR to add support for weight: #10700 . Other types can take some time, I don't have an eta yet.

If the PR is approved, you can use the nightly build for testing.

abseejp · 2024-09-05T17:24:41Z

@trivialfis, I'm here because of the same issue @shreyaspuducheri23 has. I can see that your last change ( #10700) is approved and merged but I still can't access feature importance properly (When I tried, it just returned 0.0 as the feature importance for all my features) when I set multi_strategy to multi_output_tree.

On a separate note, when I set multi_strategy to one_output_per_tree, I get a single 1D array of feature importance (even though I have 3 labels). What's going on under the hood, I was expecting to get feature importance for each label since three different independent models are built.

AnthonyYao7 · 2024-09-07T02:34:21Z

I would like to work on this

trivialfis · 2024-09-19T18:48:57Z

I was expecting to get feature importance for each label since three different independent models are built.

They were combined to represent the whole model instead of individual models.

I would like to work on this

Thank you for volunteering! Maybe #10700 can be a good start for looking into where it's calculated?

abseejp · 2024-09-19T20:49:30Z

Thanks @trivialfis for your response. When you say they were combined, what combination method is used? Is it average of all feature importance across all the models for each feature?

trivialfis · 2024-09-20T22:22:11Z

Either total or average, depending on the type of the gain you specified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when trying to get feature importance of multilabel binary classifier #10686

Segmentation fault when trying to get feature importance of multilabel binary classifier #10686

shreyaspuducheri23 commented Aug 9, 2024

trivialfis commented Aug 10, 2024

trivialfis commented Aug 12, 2024 •

edited

Loading

shreyaspuducheri23 commented Aug 12, 2024

trivialfis commented Aug 12, 2024

shreyaspuducheri23 commented Aug 12, 2024

trivialfis commented Aug 13, 2024

abseejp commented Sep 5, 2024

AnthonyYao7 commented Sep 7, 2024

trivialfis commented Sep 19, 2024

abseejp commented Sep 19, 2024

trivialfis commented Sep 20, 2024

Segmentation fault when trying to get feature importance of multilabel binary classifier #10686

Segmentation fault when trying to get feature importance of multilabel binary classifier #10686

Comments

shreyaspuducheri23 commented Aug 9, 2024

trivialfis commented Aug 10, 2024

trivialfis commented Aug 12, 2024 • edited Loading

shreyaspuducheri23 commented Aug 12, 2024

trivialfis commented Aug 12, 2024

shreyaspuducheri23 commented Aug 12, 2024

trivialfis commented Aug 13, 2024

abseejp commented Sep 5, 2024

AnthonyYao7 commented Sep 7, 2024

trivialfis commented Sep 19, 2024

abseejp commented Sep 19, 2024

trivialfis commented Sep 20, 2024

trivialfis commented Aug 12, 2024 •

edited

Loading