Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Select Rows and Table: Filtering string values #2176

Merged
merged 6 commits into from
Apr 28, 2017
Merged

[FIX] Select Rows and Table: Filtering string values #2176

merged 6 commits into from
Apr 28, 2017

Conversation

jerneju
Copy link
Contributor

@jerneju jerneju commented Mar 31, 2017

Issue

Table throws error when trying to compare string values. A string value written in Select Rows widgets and a numerical value which is set as string.
https://sentry.io/biolab/orange3/issues/244859655/

Description of changes
Steps to reproduce the behavior

Put File and Select Rows widgets and connect them. Then open iris. Set one of four features to string and meta. Then choose that feature in Select Rows and choose filter "is before" or some other one.

Includes
  • Code changes
  • Tests
  • Documentation

@jerneju
Copy link
Contributor Author

jerneju commented Mar 31, 2017

@codecov-io
Copy link

codecov-io commented Mar 31, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@8e2f09b). Click here to learn what that means.
The diff coverage is 97.5%.

@@            Coverage Diff            @@
##             master    #2176   +/-   ##
=========================================
  Coverage          ?   72.46%           
=========================================
  Files             ?      319           
  Lines             ?    55142           
  Branches          ?        0           
=========================================
  Hits              ?    39957           
  Misses            ?    15185           
  Partials          ?        0

@@ -1202,6 +1205,7 @@ def _filter_values_indicators(self, filter):
sel = np.vectorize(f)(col)
elif isinstance(f, (data_filter.FilterContinuous,
data_filter.FilterString)):
col = get_str_column(col, isinstance(f, data_filter.FilterString))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why calling a (trivial) function instead of doing it right here? It further obscures the logic of this code, which is already too contrived.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Radon and cyclomatic complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We measure cyclomatic complexity to make the code more, not less readable. :)

Find another way to decrease the complexity. Cheating with dictionaries typically makes the code slower or more complex, but in this case dictionaries are natural and fast. You can prepare a dictionary (as a class attribute, not a local variable)

    operators = {
        f.Equal: operator.eq,
        f.NotEqual: operator.neq,
        f.Less: operator.le,
        ...
    }

Then you can replace most if-s with a single col = ops[f.oper](col, fmin).

Perhaps this does not work for some reason. In this case, find another solution to simplify the function. Extracting individual ifs to local functions is not a proper solution to high cyclomatic complexity (except when it makes the code actually better).

You can also extract the code the is within the for loop into a separate function. This would make sense since this is the code that applies the filter to a selection. If this is still not enough, make a separate function for each filter type (elifs in this block of code), and extract the function that prepares the initial sel column.

@astaric astaric added this to the 3.4.2 milestone Apr 7, 2017
@astaric astaric self-assigned this Apr 7, 2017
sel += col
else:
raise TypeError("Invalid filter")
sel = self._filter_by_condition(f, conjunction, sel, data_filter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better than before, but not by much. If there's a simple and obvious a way to make _filter_by_condition more legible, do so. (I confess: that's my old code.) If there's not, forget it.

@astaric astaric modified the milestones: 3.4.2, future Apr 19, 2017
Copy link
Member

@astaric astaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what steps do I need to take to reproduce the problem, nor what is the actual change in this PR.

Please update the description with steps to reproduce, add a tests that performs those steps (if possible) and split the PR into two commits, one with the fix and another with refactoring (which was probably done to satisfy radon).

@@ -1133,6 +1133,99 @@ def _filter_same_value(self, column, value, negate=False):
sel = np.logical_not(sel)
return self.from_table_rows(self, sel)

def _filter_by_condition(self, f, conjunction, sel, data_filter): # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please stop abusing pragma: no cover.

This method should be tested, along with every other line of code in the table.py module. If codecov complains that the code you have moved around (but not changed) is not covered, make sure that the refactoring is done in a separate commit (so that the change can be inspected separately) and state that in the comment. Or write some tests. Basically any other way to approach it is better than marking actual code with "no cover".

@jerneju
Copy link
Contributor Author

jerneju commented Apr 26, 2017

@jerneju
Copy link
Contributor Author

jerneju commented Apr 26, 2017

@astaric: Added a description how to reproduce the problem.

jerneju and others added 3 commits April 26, 2017 14:04
Split the method into two, _filter_values_to_indicator handles multiple
conditions, conjunctions/disjunctions and negations, while
_filter_to_indicator handles application of a single ValueFilter.
Extract handling of FilterDiscrete, FilterContinuous and FitlerString to
separate functions, each handling its own specifics.
@astaric
Copy link
Member

astaric commented Apr 26, 2017

Thank you.

I have refactored the test a bit and split the _filter_values_indicator into multiple smaller functions (undoing the 383bab2 refactoring commit in the process).

The old _filter_values_indicators tried to handle both conjunctions/disjunctions and specific filters at the same time. I have split that code into two functions. Furthermore, evaluation of the most complicated filters (Discrete, Continuous, String) has been extracted to separate methods.

@astaric
Copy link
Member

astaric commented Apr 26, 2017

@janezd can you take a look at the new refactoring? I am almost sad that this is going to be thrown away once we move to pandas :)

@kernc
Copy link
Contributor

kernc commented Apr 26, 2017

I am almost sad that this is going to be thrown away once we move to pandas

FWIW I totally don't mind if you do the throwing. 😄

I believe a rebase should mitigate the failing test.

@astaric
Copy link
Member

astaric commented Apr 27, 2017

A restart worked as well :)

@astaric astaric merged commit b35b050 into biolab:master Apr 28, 2017
@jerneju jerneju deleted the string-filter-table branch May 3, 2017 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants