Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allowing to use a different distance for the nearest neighbors in fuzzy join #869

Open
jeromedockes opened this issue Dec 18, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@jeromedockes
Copy link
Member

Problem Description

ATM we use NearestNeighbors with the l2 distance.
if we could choose the distance to use, then using MinHash as the text encoder and "hamming" as the distance would be an approximation of 1 - Jaccard similarity, which I believed is a common choice for fuzzy joining

Feature Description

the Joiner would have a "metric" or "distance" parameter that would be forwarded to NearestNeighbors metric

Alternative Solutions

No response

Additional Context

No response

@jeromedockes jeromedockes added the enhancement New feature or request label Dec 18, 2023
@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Dec 18, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants