Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVM and/or Preprocess with sparse data (BoW): no way to get rid of warning "Input data is sparse, default preprocessing is to scale it" #6870

Open
wvdvegte opened this issue Aug 13, 2024 · 2 comments
Labels
bug report Bug is reported by user, not yet confirmed by the core team

Comments

@wvdvegte
Copy link

What's wrong?
When I'm trying to classify text processed with Bag-of-Words using SVM, the SVM dialog box shows a warning "Input data is sparse, default preprocessing is to scale it" and it won't perform classification. I would expect that Preprocess > Normalize Features > scale to σ^2 = 1 before SVM would do the trick to apply scaling to the sparse BoW data, but that produces the same warning in the SVM widget.

How can we reproduce the problem?
Try to apply SVM to text processed as BoW together with a categorical variable based on which the text can be classified. Try to insert Preprocess with Normalize Features > scale to σ^2 = 1 before SVM

What's your environment?

  • Operating system: Mac OS 14.6.1
  • Orange version: 3.37.0
  • How you installed Orange: from DMG; updates through Add-ons menu
@wvdvegte wvdvegte added the bug report Bug is reported by user, not yet confirmed by the core team label Aug 13, 2024
@processo
Copy link

You are right. Preprocess scaling leaves the data sparse. You can use the same method (only one that works on sparse data) in Continuize as an alternative.

@processo
Copy link

processo commented Sep 6, 2024

I forgot to say that my workflow (Windows 10, Orange 3.37.0) does produce predictions despite the warning. So I could not reproduce that part of the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report Bug is reported by user, not yet confirmed by the core team
Projects
None yet
Development

No branches or pull requests

2 participants