Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] FDR: Calculate FDR using numpy #3625

Merged
merged 1 commit into from
Feb 25, 2019
Merged

[ENH] FDR: Calculate FDR using numpy #3625

merged 1 commit into from
Feb 25, 2019

Conversation

VesnaT
Copy link
Contributor

@VesnaT VesnaT commented Feb 25, 2019

Issue

Current implementation of False discovery rate is using python list and for loops and is therefore inefficient.

Description of changes

Implement FDR using numpy.

Includes
  • Code changes
  • Tests
  • Documentation

@ajdapretnar
Copy link
Contributor

ajdapretnar commented Feb 25, 2019

At first I thought this will be a great fix for Text add-on, but then I realized Text implements its own FDR! 😮 I don't think this is optimal. Would you care to check that implementation as well and perhaps replace it with this one? https://github.com/biolab/orange3-text/blob/master/orangecontrib/text/stats.py

@codecov
Copy link

codecov bot commented Feb 25, 2019

Codecov Report

Merging #3625 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3625      +/-   ##
==========================================
+ Coverage   84.22%   84.23%   +0.01%     
==========================================
  Files         370      370              
  Lines       67482    67468      -14     
==========================================
- Hits        56834    56832       -2     
+ Misses      10648    10636      -12

@VesnaT
Copy link
Contributor Author

VesnaT commented Feb 25, 2019

I think those implementations were the same. So Text add-on should probably use the new one in Orange core, instead of implementing its own.

Orange/statistics/util.py Outdated Show resolved Hide resolved
Orange/statistics/util.py Show resolved Hide resolved
Orange/statistics/util.py Outdated Show resolved Hide resolved
Orange/statistics/util.py Outdated Show resolved Hide resolved

fdrs = (p_values * m / np.arange(1, len(p_values) + 1))[::-1]
fdrs = np.array(np.minimum.accumulate(fdrs)[::-1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kudos.

def test_FDR_dependent(self):
p_values = np.array([0.0002, 0.0004, 0.00001, 0.0003, 0.0001])
np.testing.assert_almost_equal(
np.array([0.00076, 0.00091, 0.00011, 0.00086, 0.00057]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you gotten these number independently, that is, from another source or calculated manually, not from the code itself? (I'm asking because I used to be a big sinner myself.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I calculated those using the old function.

Orange/statistics/util.py Outdated Show resolved Hide resolved
@ajdapretnar
Copy link
Contributor

Agree.

ordered = is_sorted(p_values)
if p_values is None or len(p_values) == 0 or \
(m is not None and m <= 0):
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cover this line in tests to get approval by codecov (and to make sure we don't forget these cases at any future refactoring).

@ajdapretnar
Copy link
Contributor

FDR in Text add-on replaced with this implementation: biolab/orange3-text#416

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants