Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Fix Chi2 computation for variables with values with no instances #2031

Merged
merged 1 commit into from
Feb 21, 2017
Merged

[FIX] Fix Chi2 computation for variables with values with no instances #2031

merged 1 commit into from
Feb 21, 2017

Conversation

jerneju
Copy link
Contributor

@jerneju jerneju commented Feb 20, 2017

Chi-squared test is nan when there are attributes which are not in the data. It is caused by division by zero because code does not calculate limits. It actually suppose to be 0.

  • Code changes
  • Tests
  • Documentation

@codecov-io
Copy link

codecov-io commented Feb 20, 2017

Codecov Report

Merging #2031 into master will decrease coverage by -1.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2031      +/-   ##
==========================================
- Coverage    70.7%   69.69%   -1.01%     
==========================================
  Files         343      343              
  Lines       54469    54478       +9     
==========================================
- Hits        38510    37967     -543     
- Misses      15959    16511     +552

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f7a94a...0d67dc1. Read the comment docs.

@@ -42,6 +42,8 @@ def __init__(self, data, attr1, attr2):
self.expected = np.outer(self.probs_y, self.probs_x) * self.n
self.residuals = \
(self.observed - self.expected) / np.sqrt(self.expected)
where_are_NaNs = np.isnan(self.residuals)
self.residuals[where_are_NaNs] = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gh-2031
Check if it can calculate chi square when there are no attributes which suppose to be.
"""
tempdir = tempfile.mkdtemp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating files in code, you could either commit the file along the test, or even better create the table directly with something along the lines of:

a, b = Orange.data.DiscreteVariable("a", values=["y", "n"]), Orange.data.DiscreteVariable("b", values=["y", "n", "other"])
t = Orange.data.Table(Orange.data.Domain([a, b], list(zip("yynny", "ynyyn"))))

…which suppose to be

Chi-squared test is nan when there are attributes which are not in the data. It is caused by division by zero because  code does not calculate limits. It actually suppose to be 0.

Check if there is NaN in the array and then change that value to 0.

- [X] Code changes
- [X] Tests
- [ ] Documentation
@astaric astaric changed the title [FIX] ghissue-2020 Chisq not calculated when there are no attributes … [FIX] Fix Chi2 computation for variables with values with no instances Feb 21, 2017
@astaric astaric merged commit 4a22791 into biolab:master Feb 21, 2017
@jerneju jerneju deleted the ghissue-2020 branch April 20, 2017 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants