Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hamming distance #3185

Closed
thocevar opened this issue Aug 3, 2018 · 6 comments
Closed

Hamming distance #3185

thocevar opened this issue Aug 3, 2018 · 6 comments

Comments

@thocevar
Copy link
Contributor

thocevar commented Aug 3, 2018

There is no distance in Orange that can be used for distance calculations between columns in data with discrete attributes.

Implement Hamming distance in distance.py and add it to the OWDistances widget.

@ajdapretnar
Copy link
Contributor

Likely also useful for text.

@clone95
Copy link

clone95 commented Nov 7, 2018

Good morning, i'm a Master student and our professor taught us how to use orange. i find it a very useful tool and i would like to help out with small things. if I understand correctly, it should be for example distance ([1,3,6,4], [2,3,6,9]) returns 2?

@kernc
Copy link
Contributor

kernc commented Nov 7, 2018

how this distance between two columns of discrete values would be calculated?

Similarly to simple matching coefficient, increase the distance by 1 for each index at which two vectors differ.

@clone95
Copy link

clone95 commented Nov 7, 2018

ok thanks. Now i'm going to explore the project a bit before doing this. Thanks a lot.

@atyamsriharsha
Copy link

atyamsriharsha commented Dec 13, 2018

Hey, @thocevar I can see that there is no development started on this issue and I would like to contribute to this issue. I was looping through the code and found out that in Euclidean distance metric we are using means as an offset for normalization and two standard deviations in scaling whereas medians are used as an offset for normalization and two MADS are used in scaling in Manhattan distance model. Can you please provide some insight into how do we select these parameters and what are the metrics we should use while implementing Hamming distance. Thanks :)

@janezd
Copy link
Contributor

janezd commented Dec 13, 2018

See https://github.com/biolab/orange3/files/1128190/distances.pdf, referenced in #2454. As I recall a later changed some parts, but the reasoning for using two standard deviations is still as explained there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants