Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] PCA: Output variance of components #5513

Merged
merged 1 commit into from
Aug 13, 2021

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Jul 4, 2021

Issue

Fixes two thirds of #5468.

To test before biolab/orange-widget-base#162 is merged, change

        self.varmodel[:] = zip(self._variance_ratio[:self.maxp],
                               numpy.cumsum(self._variance_ratio))

to

        self.varmodel.wrap(list(zip(self._variance_ratio[:self.maxp],
                               numpy.cumsum(self._variance_ratio))))
Description of changes
  1. Variances are added as attributes of variables, and as extra meta columns in the components output.

  2. The table with variances is shown in control area and can be used as the third mean to select components. Clicking a row selects all components up to that row. (Selecting arbitrary rows is not possible: the widget's code does not support it; interactions the other two controls would be weird; this is not typical for PCA; for atypical uses, there's Select Rows; no a/b testing of this feature will be done, thanks for not asking :).

    Screenshot 2021-07-04 at 13 03 54
  3. The third point in the issue was reporting variance on new data. This can be added to the table, but I'd suggest this to be a separate issue/PR.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov
Copy link

codecov bot commented Jul 4, 2021

Codecov Report

Merging #5513 (3cd179c) into master (f9d78d6) will decrease coverage by 0.01%.
The diff coverage is 79.31%.

❗ Current head 3cd179c differs from pull request most recent head a2afed5. Consider uploading reports for the commit a2afed5 to get more accurate results

@@            Coverage Diff             @@
##           master    #5513      +/-   ##
==========================================
- Coverage   86.38%   86.36%   -0.02%     
==========================================
  Files         304      304              
  Lines       61753    61848      +95     
==========================================
+ Hits        53344    53414      +70     
- Misses       8409     8434      +25     

@janezd janezd changed the title Pca show variance [ENH] PCA: Show variance of components Jul 4, 2021
@janezd
Copy link
Contributor Author

janezd commented Jul 5, 2021

I added variance on test data. (Not thoroughly checked, yet.) The table is a bit wide, but OK. The plot may look nice

Screenshot 2021-07-05 at 10 27 31

... except when it doesn't:

Screenshot 2021-07-05 at 10 25 52

I hate to think about how to compute the positions of labels in pyqtgraph to prevent overlapping. Any good suggestions?

@ajdapretnar
Copy link
Contributor

I love how this widget got from basic to jacked 💪 in no time. 😆

As for the label, if points for labeling are <= to the height of the font, then plot the higher one above, not below the point it labels? (stupid suggestion probably, just throwing ideas out there)

@janezd
Copy link
Contributor Author

janezd commented Jul 5, 2021

The problem is that in pyqtgraph you set the position in plot coordinates, not in pixels. There are several functions that transform coordinates it's all black magic, at least to @VesnaT and me. I'm thus not even sure I can detect that labels are overlapping.

Ha! Labels are QGraphicsTextItem's, so Qt tells you if they collide. In case they do, I just set the anchors as you suggested and it works like magic. Sometimes the label would overlap with the curve, which didn't happen before because curves were monotone. But it was good for my grandmother, Lord it's good enough for me!

@markotoplak markotoplak changed the title [ENH] PCA: Show variance of components [ENH] PCA: Output variance of components Aug 13, 2021
@markotoplak
Copy link
Member

I rebased this and am (for now) only merging the first commit. The others are backed-up at https://github.com/markotoplak/orange3/tree/pca-show-variance-backup

In between, a group from Mannheim, particularly @Les-Simon, proposed some changes to the PCA (and other widgets): https://github.com/Charisma-Mannheim/OrangeExtension. They tend to work with PCA a lot, much more than we do. We should consider their suggestions before adding additional complexity to the widget.

@markotoplak markotoplak merged commit 39d3cb6 into biolab:master Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants