Test and Score: Warn about transformation, raise error if all is nan #3323

janezd · 2018-10-19T12:10:22Z

Issue

Domain transformations in Test and Score (and Predictions?) are implicit. Users should be informed about them.

Description of changes

Test and Score shows this exception as a user-readable error (without mentioning "domains")
If train and test data have different domains but transformation does not result in all-nans, an information about the transformation appears.

Includes

Code changes
Tests
Documentation

janezd · 2018-10-19T12:10:37Z

@markotoplak, something like this?

markotoplak

Seems fine, except for the first elif.

Did you test for self.resampling == self.TestOnTest so that you do not get these warnings if you use the Preprocess input?

Orange/base.py

Orange/widgets/evaluate/owtestlearners.py

janezd · 2018-10-20T08:23:29Z

Some tests fail because they trigger the new exception. I can obviously fix the tests. But.

Do we consider getting all-nans after domain transformation an error? For the purpose of canvas, it is. The user has probably done something wrong and must be warned, and the result is useless anyway, so no harm done. For the purpose of Orange as DM library, it is not necessarily an error; numpy is generous with nans and infs and lets the programmer figure it out..

One solution is to degrade this from exception to a warning. However, widgets would then have to set the warning filter to escalate this warning to an exception, so they'd catch it and report. I would think that the above distinction is theoretical and in practice we would always escalate to an exception. Orange is never used outside of canvas.

I'd like to hear some opinions. My preference is to keep it as an exception, but perhaps add another check: I would not trigger an exception if the original data already consisted of all-nans.

markotoplak · 2018-10-25T21:37:40Z

@janezd, hmm, I really do not know. I'd go either for an exception (without the check) or a warning. Exception + check seems like a strange compromise...

janezd · 2018-10-28T15:37:30Z

@markotoplak, I had to add some checks to prevent raising exceptions in some semi-legitimate cases.

codecov · 2018-11-06T21:22:45Z

Codecov Report

Merging #3323 into master will increase coverage by 0.02%.
The diff coverage is 96.42%.

@@            Coverage Diff             @@
##           master    #3323      +/-   ##
==========================================
+ Coverage    82.3%   82.32%   +0.02%     
==========================================
  Files         360      360              
  Lines       64097    64121      +24     
==========================================
+ Hits        52753    52788      +35     
+ Misses      11344    11333      -11

markotoplak · 2018-11-07T14:54:17Z

I tried writing tests and I noticed that for scikit-learn based learners this PR does not really work. For scikit-learn methods Orange adds some very aggressive preprocessors, which work hard to remove all nans. For example, with logistic regression, I can still build a learner on iris and apply its output to titanic.

I think preprocessing is so agressive to handle potentially missing columns in the test data. Ouch.

janezd · 2018-11-08T21:32:02Z

We could still merge it because it works for non-skl methods, like Naive Bayes. And it will gradually work better as we gradually get rid of skl. :)

I mean, it doesn't hurt to have this check, although it doesn't always work.

To make it better, we could add a flag to preprocessors that would tell them to raise an exception if all data is nan. When using them for preprocessing training data, we'd set this flag. Since a model cannot be fit to a table of nans, raising such an exception would make sense, right?

markotoplak · 2018-11-08T22:36:25Z

How about first transforming the data to the original domain, then checking for NaNs, and afterwards continuing with however the classifier wants to preprocess? I just tried this:

            # transform to the original domain and check if the data is compatible
            data = data.transform(self.original_domain)
             if data.X.size \
                     and np.isnan(data.X).all() \
                     and not np.isnan(orig.X).all() \
                     and data.domain.attributes != orig.domain.attributes:
                 raise DomainTransformationError(
                     "domain transformation produced no defined values")
            # apply transformations added by the learner
            data = data.transform(self.domain)

Surprisingly, all tests pass. Perhaps we could then even remove some conditions, but I did not look well into it.

Do you see any potential problems with this approach?

janezd · 2018-11-18T13:09:21Z

Thanks. Fixed as suggested, except that I don't transform the data to the original domain if this cannot raise an exception.

markotoplak · 2018-11-19T16:05:06Z

I added some tests. I think it is ready to merge. Please check the tests, Janez.

janezd · 2018-11-19T16:57:48Z

Thanks for the tests. I think they're OK. If you agree with the code, you can merge.

markotoplak reviewed Oct 19, 2018

View reviewed changes

Orange/base.py Outdated Show resolved Hide resolved

Orange/widgets/evaluate/owtestlearners.py Show resolved Hide resolved

janezd force-pushed the test-score-transformation branch from a267282 to c23b1ae Compare October 19, 2018 13:21

lanzagar added this to the 3.18 milestone Oct 26, 2018

janezd force-pushed the test-score-transformation branch 3 times, most recently from 05d2f54 to b21e756 Compare October 27, 2018 21:03

janezd force-pushed the test-score-transformation branch from b21e756 to 5fa1b6d Compare November 6, 2018 21:22

janezd force-pushed the test-score-transformation branch from 5fa1b6d to a4f998d Compare November 7, 2018 13:43

markotoplak self-assigned this Nov 7, 2018

lanzagar modified the milestones: 3.18, 3.19 Nov 13, 2018

janezd force-pushed the test-score-transformation branch from a4f998d to c3a0f36 Compare November 18, 2018 13:05

janezd added 2 commits November 18, 2018 14:06

Model: Raise an exception if transformation results in all nans

866bd9a

Prediction and Test and Score: give warning/error on wrong domain

19f595f

janezd force-pushed the test-score-transformation branch from c3a0f36 to 19f595f Compare November 18, 2018 13:06

Test prediction on incompatible domain

674ebff

markotoplak merged commit 8784dbb into biolab:master Nov 19, 2018

markotoplak mentioned this pull request Nov 20, 2018

Fix tests to work with DomainTransformationError Quasars/orange-spectroscopy#244

Merged

markotoplak mentioned this pull request Dec 21, 2018

Warn when predicting with failed domain conversion #1868

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test and Score: Warn about transformation, raise error if all is nan #3323

Test and Score: Warn about transformation, raise error if all is nan #3323

janezd commented Oct 19, 2018 •

edited

Loading

janezd commented Oct 19, 2018

markotoplak left a comment

janezd commented Oct 20, 2018

markotoplak commented Oct 25, 2018

janezd commented Oct 28, 2018

codecov bot commented Nov 6, 2018 •

edited

Loading

markotoplak commented Nov 7, 2018

janezd commented Nov 8, 2018

markotoplak commented Nov 8, 2018

janezd commented Nov 18, 2018

markotoplak commented Nov 19, 2018

janezd commented Nov 19, 2018

Test and Score: Warn about transformation, raise error if all is nan #3323

Test and Score: Warn about transformation, raise error if all is nan #3323

Conversation

janezd commented Oct 19, 2018 • edited Loading

Issue

Description of changes

Includes

janezd commented Oct 19, 2018

markotoplak left a comment

Choose a reason for hiding this comment

janezd commented Oct 20, 2018

markotoplak commented Oct 25, 2018

janezd commented Oct 28, 2018

codecov bot commented Nov 6, 2018 • edited Loading

Codecov Report

markotoplak commented Nov 7, 2018

janezd commented Nov 8, 2018

markotoplak commented Nov 8, 2018

janezd commented Nov 18, 2018

markotoplak commented Nov 19, 2018

janezd commented Nov 19, 2018

janezd commented Oct 19, 2018 •

edited

Loading

codecov bot commented Nov 6, 2018 •

edited

Loading