Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Sort values naturally when reading files #4793

Merged
merged 1 commit into from
May 22, 2020

Conversation

PrimozGodec
Copy link
Contributor

Issue

Fixes #4778

Description of changes

When reading files sort values naturally (before alphabetical sort was used).
Natural sort sorts numbers inside string as numbers: "sth2" is before "sth12"

Discussion

#4778 also said that Ordered is not used it is partially true. It sets the ordered flag in the DiscreteVariable. This flag tells whether values are ordered in the variable so that other widgets would respect that. I think values are in a current situation always ordered variable and all widgets respect this order (at least none expect FeatureStatistics care about this flag). So I think this flag can be removed. It is used only in the future statistics widget where if this flag is set the min and the max values are shown for the discrete variable.

In case we decide to leave this variable as it is we need to write better documentation for this checkbox.

Includes
  • Code changes
  • Tests
  • Documentation

@PrimozGodec PrimozGodec added the needs discussion Core developers need to discuss the issue label May 19, 2020
@codecov
Copy link

codecov bot commented May 19, 2020

Codecov Report

Merging #4793 into master will increase coverage by 0.12%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4793      +/-   ##
==========================================
+ Coverage   83.86%   83.99%   +0.12%     
==========================================
  Files         281      276       -5     
  Lines       56804    56083     -721     
==========================================
- Hits        47638    47106     -532     
+ Misses       9166     8977     -189     

@janezd
Copy link
Contributor

janezd commented May 19, 2020

This flag tells whether values are ordered in the variable so that other widgets would respect that. I think values are in a current situation always ordered variable and all widgets respect this order (at least none expect FeatureStatistics care about this flag). So I think this flag can be removed.

Not all widgets. Heat map doesn't. #4529! If the flag is set, it would have to be observed!

The flag was meant, in ancient times, to change behaviour of some methods like subsetting in binary trees. If variable's values are ordered, tree induction wouldn't try all possible subsets (as it does now), but only all possible cuts.

As it is, I don't think it is observed anywhere, and the order of values is "not ignored" by chance. It seems we can remove it, but let's discuss.

@ajdapretnar
Copy link
Contributor

I would just state that Heat Map does consider the order. Split by uses the order set by the user. #4529 is about labels, which have nothing to do with DiscreteVariable as such. The behaviour is the same as in heat map plots in other libraries.

Orange/data/io_util.py Outdated Show resolved Hide resolved
Copy link
Contributor

@janezd janezd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought - does natural_sorted indeed belong to io_util or is it more general? (Our "more general functions" are not well-organized and packed, though, so this may not really matter.)

@janezd janezd self-assigned this May 21, 2020
@PrimozGodec
Copy link
Contributor Author

Just a thought - does natural_sorted indeed belong to io_util or is it more general? (Our "more general functions" are not well-organized and packed, though, so this may not really matter.)

It is a more general function and can be placed somewhere else. Maybe in Orange.utils?

@janezd
Copy link
Contributor

janezd commented May 21, 2020

You mean Orange.util? I'd prefer to avoid it, it could become a kitchen sink of functions.

Orange.misc has similar stuff, but organized into modules. What about putting it there? Perhaps into Orange.misc.collections? As I said, it was just a though. These "handy functions" were never organized well in Orange.

@PrimozGodec
Copy link
Contributor Author

@janezd I moved the function to misc

@janezd janezd merged commit 69e95f2 into biolab:master May 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion Core developers need to discuss the issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Edit Domain: replace Ordered with Sort numerically
3 participants