Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Box plot: Add 'order by importance' checkbox to groups #4055

Merged
merged 1 commit into from
Nov 15, 2019

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Sep 25, 2019

Box plot shows distribution of some attribute and, if grouping is enabled, how the distribution of this attribute varies accross groups. It thus conveys information about conditional probability of the target variable given the value of the grouping variable.

If "Order by importance" is checked, the widget computes the chi square or ANOVA between all target variables and the currently selected group. This may help the user answer the question "If I divide the data into such and such groups, which is the attribute by which these groups differ most".

However, since the widget shows the conditional probability of the target given the grouping variable, it might make more sense to sort grouping variables. This will allow the user to set the target (typically the outcome, the class) and see which (grouping) attribute is the most informative about this class. This is currently possible by setting the outcome as the grouping variable and sorting the variables whose distributions we're observing, but in this case the widget shows the wrong conditional probabilities.

Both ways make some sense, but I believe that sorting group variables makes more sense. A circumstantial evidence for the latter is also that if we sort by variables, we compute a mixture of chi-square and ANOVA p-values and sort them. This is not wrong, they should be commensurable. If we sort by groups, we compute either chi-square (if variable is discrete) or ANOVA (if it's numeric) for all groups (because all groups are always discrete).

I changed the widget so that it can be tried out, but haven't thoroughly checked the code yet. I would appreciate some comments before jumping into a change that we might decide to revert the day after tomorrow.

@BlazZupan?

@codecov
Copy link

codecov bot commented Sep 25, 2019

Codecov Report

Merging #4055 into master will increase coverage by <.01%.
The diff coverage is 95.5%.

@@            Coverage Diff             @@
##           master    #4055      +/-   ##
==========================================
+ Coverage   85.94%   85.94%   +<.01%     
==========================================
  Files         393      393              
  Lines       70033    70090      +57     
==========================================
+ Hits        60187    60240      +53     
- Misses       9846     9850       +4

@janezd janezd added the needs discussion Core developers need to discuss the issue label Oct 17, 2019
@janezd
Copy link
Contributor Author

janezd commented Oct 19, 2019

Gosh, I forgot what we decided in the end, so please forget me if I'm wrong.

  • We would have a checkbox under each list. This won't create inconsistencies, but may be confusing. In this case, we can make them exclusive.
  • When sorting changes, the selected item scrolls into view.
  • Open question: where is the class? If defaults are stretched bars, then the appropriate initial selection is to plot class and have no groups. Also in line with @lanzagar's description of what this widget essentially is (grouping is just a fancy addition). Alternative is showing the first variable and grouping by class.

@janezd janezd removed the needs discussion Core developers need to discuss the issue label Oct 24, 2019
@janezd janezd changed the title [RFC] [WIP] [ENH] Box plot: Move 'order by importance' checkbox to groups [RFC] [WIP] [ENH] Box plot: Add 'order by importance' checkbox to groups Nov 1, 2019
@janezd
Copy link
Contributor Author

janezd commented Nov 1, 2019

@BlazZupan and @lanzagar, please confirm functionality. I'll write/fix tests afterwards.

@janezd janezd added the needs discussion Core developers need to discuss the issue label Nov 8, 2019
@janezd janezd changed the title [RFC] [WIP] [ENH] Box plot: Add 'order by importance' checkbox to groups [WIP] [ENH] Box plot: Add 'order by importance' checkbox to groups Nov 8, 2019
@janezd
Copy link
Contributor Author

janezd commented Nov 8, 2019

Comment by @BlazZupan: when stretching bars makes no sense (when there are no groups or when the grouping variable is the same as the variable shown), bars should not be strectched. This should also disable the checkbox.

This does notbelong to this PR and is implemented in #4176.

@janezd janezd removed the needs discussion Core developers need to discuss the issue label Nov 8, 2019
@janezd janezd changed the title [WIP] [ENH] Box plot: Add 'order by importance' checkbox to groups [ENH] Box plot: Add 'order by importance' checkbox to groups Nov 8, 2019
@janezd janezd force-pushed the boxplot-order branch 2 times, most recently from 5330135 to 537e5f4 Compare November 14, 2019 18:36
@janezd janezd force-pushed the boxplot-order branch 4 times, most recently from 0895222 to 5c985b2 Compare November 15, 2019 12:30
@VesnaT VesnaT merged commit fba5245 into biolab:master Nov 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants