Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] OWRandomize: Add a new widget #1863

Merged
merged 2 commits into from
Jan 20, 2017
Merged

Conversation

VesnaT
Copy link
Contributor

@VesnaT VesnaT commented Jan 4, 2017

Issue

Create a new widget for shuffling columns in a data able.

Description of changes

Enable multiple types of randomization at the same time.
New widget Randomize.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov-io
Copy link

codecov-io commented Jan 4, 2017

Current coverage is 89.28% (diff: 100%)

Merging #1863 into master will increase coverage by <.01%

@@             master      #1863   diff @@
==========================================
  Files            86         86          
  Lines          9116       9115     -1   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           8138       8138          
+ Misses          978        977     -1   
  Partials          0          0          

Sunburst

Powered by Codecov. Last update 463242d...3052123

@VesnaT VesnaT changed the title OWRandomize: Add a new widget [ENH] OWRandomize: Add a new widget Jan 6, 2017
data = None
if self.data:
rand_seed = self.random_seed or None
_type = Randomize.RandomizeClasses * self.shuffle_class | \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_ instead of _type (like exec_ in main. Where, by the way, the underscore is no longer necessary in Python 3, AFAIK).


def send_report(self):
text = "No shuffling"
labels = ["Classes", "Features", "Metas"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use lowercase here (Shuffle: Classes, Metas doesn't look nice)

labels = ["Classes", "Features", "Metas"]
include = [self.shuffle_class, self.shuffle_attrs, self.shuffle_metas]
if sum(include):
text = ", ".join([l for i, l in zip(include, labels) if i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer printing "a, b, c, d and e", where possible. Try something like

def send_report(self):
    labels = ["classes", "features", "metas"]
    include = [self.shuffle_class, self.shuffle_attrs, self.shuffle_metas]
    include = [label for labels, i in zip(labels, include)
    text = " and ".join(filter(None, (", ".join(include[:-1]), include[-1:]))) or "none"
    self.report_items ...

This doesn't require if. If you find the last line too cryptic, you can get rid of filter by handling len(include) == 1 separately.

self.send("Data", data)

def send_report(self):
text = "No shuffling"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer text = "none".

self.data = None

# GUI
box = gui.vBox(self.controlArea, "Shuffle")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vertical arrangement of check boxes with such short labels doesn't look nice. I think this is nicer:

    box = gui.hBox(self.controlArea, "Shuffled columns")
    box.layout().setSpacing(20)

That is, hBox, longer (more informative) title, and spacing.

if sum(include):
text = ", ".join([l for i, l in zip(include, labels) if i])
self.report_items("Settings",
[("Shuffle", text),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the box title is changed, perhaps also change "Shuffle" to "Shuffled columns".

text = ", ".join([l for i, l in zip(include, labels) if i])
self.report_items("Settings",
[("Shuffle", text),
("Scope", "{}% of data".format(self.scope_prop)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"of rows" would be more accurate than "of data". However, you can change "Scope" is changed to "Proportion of shuffled rows", and then put just "{} %".format(self.scope_prop).

@@ -326,8 +326,7 @@ class Randomize(Preprocess):
>>> randomizer = Randomize(Randomize.RandomizeClasses)
>>> randomized_data = randomizer(data)
"""
Type = Enum("Randomize",
"RandomizeClasses, RandomizeAttributes, RandomizeMetas")
Type = 1, 2, 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now no longer an enum. Why not something like this?

Type = Enum("Randomize", dict(RandomizeClasses=1, RandomizeAttributes=2, RandomizeMetas=4))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should it be an enum? It is no longer enum, because they don't support bitwise operators.
If you insist with enums, I should probably add two new parameters to the method (for each type of randomization). Or am I missing something here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enums accept a type parameter, e.g. Enum(..., type=int).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, no, it doesn't have to be Enum. We're mostly not using it. I just liked it and asked why it was removed.

resizing_enabled = False
want_main_area = False

shuffle_class = Setting(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer True and False, so everybody knows that these are boolean flags, not numbers. The trick in apply will still work.

rand_seed = self.random_seed or None
_type = Randomize.RandomizeClasses * self.shuffle_class | \
Randomize.RandomizeAttributes * self.shuffle_attrs | \
Randomize.RandomizeMetas * self.shuffle_metas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly how I'd do it and I like it. But perhaps

type_ = sum(part for part, flag in zip(Randomize.Type, [self.shuffle_class, shuffle_attrs, shuffle_metas]) if flag)

is better. (If you use [self.shuffle_class, shuffle_attrs, shuffle_metas] in a lot of places, you can create a property

@property
 def parts(self):
     return [self.shuffle_class, shuffle_attrs, shuffle_metas]

and simplify this expression and also send_report.

@janezd
Copy link
Contributor

janezd commented Jan 14, 2017

I'd prefer a gui like this.

screen shot 2017-01-14 at 17 32 16

@VesnaT
Copy link
Contributor Author

VesnaT commented Jan 16, 2017

Done

@janezd janezd self-assigned this Jan 16, 2017
@janezd janezd merged commit 70fd197 into biolab:master Jan 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants