Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Impute: Allow setting a default value for all numeric and time variables #5102

Merged
merged 5 commits into from
Jan 8, 2021

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Nov 27, 2020

Issue

Fixes #5065.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Nov 27, 2020

Codecov Report

Merging #5102 (8286474) into master (d04e3cd) will increase coverage by 0.00%.
The diff coverage is 97.29%.

@@           Coverage Diff           @@
##           master    #5102   +/-   ##
=======================================
  Coverage   84.77%   84.77%           
=======================================
  Files         286      286           
  Lines       60074    60130   +56     
=======================================
+ Hits        50927    50977   +50     
- Misses       9147     9153    +6     

@ajdapretnar
Copy link
Contributor

Try entering a value with a decimal comma. It breaks the widget.

@@ -124,7 +128,7 @@ def __call__(self):


class Default(BaseImputeMethod):
name = "Value"
name = "Fixed value"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new class was introduced below with name = "Fixed value". Was this one intentionally changed to the same name too (short names remain different)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's more informative, looks better in the widget.

hlayout.addWidget(button)

le = gui.lineEdit(
None, self, "default_numeric", valueType=float,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.default_numeric is a string by default, but this changes it to a float. This is inconsistent and probably helped with the below bug on line 308 (and others?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially planned to have a float, but then decided for str and forgot to remove this argument.

@@ -267,6 +303,11 @@ def create_imputer(self, method, *args):
m = AsDefault()
m.method = default
return m
elif method == Method.Default and not args: # global default values
return impute.FixedValueByType(
default_continuous=float(self.default_numeric or np.nan),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.default_numeric can be 0 which evaluates to False so np.nan is used instead!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed. Thanks.

@janezd
Copy link
Contributor Author

janezd commented Dec 4, 2020

Not imputing anything: it works for me. Which data did you try and how? Could it be that you tried to impute 0 (the bug discovered by @lanzagar) or you used a wrong delimiter (see next point)?

Decimal comma: I now changed the widget to use the current locale. Select Rows does the same. But this is inconsistent: I now need to enter 42,0 and the Table widget shows 42.0. Have we been discussing this before? We use . as decimal separator when displaying floats, so we should probably also fix US locale?

Discrete values: they should come from a combo, I guess. Which values would it contain? For, say, heart disease data? Also, this should then be a context setting that would depend upon whether the selected value is present or not. Using a new value for all discrete variables is also problematic. What if this particular value name already exist for some variables? I found this too messy.

@ajdapretnar
Copy link
Contributor

Could it be that you tried to impute 0 (the bug discovered by @lanzagar).

You mean a bug discovered by me and reported by Mr @lanzagar? :D Yes, it was the case of 0.

Discrete values: this is why I deleted a comment, because I realized it is too difficult to implement. So, my use case is usually having N discrete variables with values 0, 1. Some values are missing (sometimes there are just 1, no 0). And I want an easy way to impute in Orange, just fill all the missing values with 0. This is why I asked about discrete and this is why I tested with 0. I was told it is difficult to provide a reasonable solution for discrete.

@ajdapretnar
Copy link
Contributor

This data set still breaks impute.
missing1.xlsx

Well, I should probably say it breaks Data Table badly and other widgets less badly. Seems like TimeVariable is not formatted properly?

@janezd janezd force-pushed the impute-constant-all branch 2 times, most recently from 73e3578 to 52792c2 Compare December 11, 2020 11:15
@janezd
Copy link
Contributor Author

janezd commented Dec 11, 2020

I believe this to be fixed now. (It worked at some point, but now it was broken for any file. Shame on me - I apparently changed something and haven't tested it again.)

It can be merged now, but has to wait for #5117 (fix for xlsx), because it' s based on it.

@ajdapretnar
Copy link
Contributor

I managed to crash something while testing.

---------------------------- IndexError Exception -----------------------------
Traceback (most recent call last):
  File "/Users/ajda/orange/orange3/Orange/widgets/data/owtable.py", line 83, in data
    return super().data(index, role)
  File "/Users/ajda/orange/orange3/Orange/widgets/utils/itemmodels.py", line 978, in data
    return coldesc.format(instance)
  File "/Users/ajda/orange/orange3/Orange/widgets/utils/itemmodels.py", line 828, in format_dense
    return str(instance[var])
  File "/Users/ajda/orange/orange3/Orange/data/variable.py", line 176, in __str__
    return self.variable.str_val(self)
  File "/Users/ajda/orange/orange3/Orange/data/variable.py", line 813, in repr_val
    return '{}'.format(self.values[int(val)])
IndexError: tuple index out of range

But for the life of me I can't reproduce now, so...

@lanzagar lanzagar merged commit 19bff36 into biolab:master Jan 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Impute: impute with fixed value for all features
3 participants