-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Neighbors: improve exclusion of references, checkbox to (un)limit output data #4997
[ENH] Neighbors: improve exclusion of references, checkbox to (un)limit output data #4997
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4997 +/- ##
==========================================
- Coverage 84.62% 84.62% -0.01%
==========================================
Files 285 285
Lines 59427 59430 +3
==========================================
+ Hits 50288 50290 +2
- Misses 9139 9140 +1 |
If the user wants to have all instances, (s)he now has to limit it to a high-enough number. Wouldn't it be more elegant to just put a checkbox in front of the spin? As for 1e-5, what about this: if there are some instances that are (almost) equal to the reference instance, we would expect the distribution of distances to have a small group of values below 1e-5 (I think this is still reasonably above numeric errors) and then nothing for a long time (but perhaps not all the way to 1e-5). On the other hand, we may not need to go through all this: values are normalized, hence 1e-5 is already dynamic in a sense. Have you encountered cases when it removed instances that you'd prefer kept? I wouldn't expose 1e-5 in the user interface. Beside it looking to technical and taking too much space, the number is meaningless to the user because variables are normalized. I guess. |
I did put a checkbox in front of the spin using the What if the values of the instances are very small? I don't think they should be automatically normalized, this should be an option similar to The Jaccard distance always (? needs more testing) gives me the "all instances are the same" message for spectroscopy data. I haven't looked into the actual calculation though. I put 1E-5 to the GUI hoping that the hardcoded comparison will be removed in the new version of the calculation. |
I must be blind. Sorry. .) The ridiculously high number could be replaced (there's no reason not to...), though it won't change anything. Jaccard is a different kind of distance, not applicable to spectroscopy. It computes differences between sets. Discussion results:
Would this be OK? Would you continue working on this or should somebody else take over? |
@janezd, thanks for the update! I'd be happy with any solution as long as we can output the full incoming dataset. For the normalization, it would be great to have it as an option. If not too slow, I think you should take over because I just saw @BlazZupan just reported something in #5000 that I'm not sure I can fix. I started working on it because a user requested the feature, so there is a time factor. :) |
3c8b04a
to
4c0cbcc
Compare
- Adjust the maximal number of neighbours to the data size - Enable OWNeighbors to output all distances relative to reference - Exclude references based on ids, not distances - Remove the checkbox for excluding references; exclude always - Removed unused setting
4c0cbcc
to
ee6591e
Compare
@janezd, this looks good. The only question is whether we need to keep backward compatibility for this one. As this is more like a bugfix (and the changes would be slight), I'd say there is no need to make a "compatibility mode". What do you think? |
I think this is too trivial for complications with "compatibility mode". It would be just dead, garbage code. |
Issue
OWNeighbors was limited to output distances only a subset of the input data. This change enables outputting distances to the full dataset.
Description of changes
For Orange Spectroscopy we would like to be able to compare the full dataset to a selected reference instance and visualize the distances to it.
TODO:
The widget needs some work in my opinion, right now, every instance that is in an arbitrarily defined 1E-5 distance from the reference is excluded from the output. This should be handled dynamically based on the values of the data instances.
Includes