Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating new features can produce inconsistent behaviour #2500

Closed
markotoplak opened this issue Jul 26, 2017 · 2 comments
Closed

Creating new features can produce inconsistent behaviour #2500

markotoplak opened this issue Jul 26, 2017 · 2 comments
Assignees
Labels
bug A bug confirmed by the core team

Comments

@markotoplak
Copy link
Member

I was debugging a problem where the user wanted to apply PCA to separate test data. The problem happens because creating features with the same names as in original data replaces features in the Variable._all_vars dictionary, which makes newly loaded data incompatible with previously built transformations. See the following:

    import Orange
    data = Orange.data.Table("iris")
    pca = Orange.projection.PCA()(data)(data)
    print("Original feature id", id(data.domain.attributes[0]))
    print("PCA transform", data.transform(pca.domain)[0])

    # we create variables with the same names
    nf = [Orange.data.ContinuousVariable(a.name) for a in data.domain.attributes]
    print("New feature id", id(nf[0]))

    # the new data uses variables from nf, hence the transformation won't work
    data = Orange.data.Table("iris")
    print("Loaded feature id", id(data.domain.attributes[0]))
    print(data.transform(pca.domain)[0])
    print("PCA transform", data.transform(pca.domain)[0])  # all zeros

There may be many hidden bugs stemming from this. Is there any case when we would not want to use Variable.make functionality by default?

@markotoplak markotoplak added the bug A bug confirmed by the core team label Jul 26, 2017
@janezd
Copy link
Contributor

janezd commented Jul 26, 2017

Maybe. Let's say we have a variable gender and give it values F and M. Next we have a new data set with variable gender and values 0 and 1. You want to have a new variable, not a single variable with values F, M, 0 and 1. (Which is a very bad example of what we'd like to avoid, because the file widget will currently does exactly this. :()

@astaric had a good idea regarding make and reuse, but I forgot it; I hope he has better memory than me.

@janezd
Copy link
Contributor

janezd commented Sep 20, 2019

Irrelevanted via #3925.

@janezd janezd closed this as completed Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug confirmed by the core team
Projects
None yet
Development

No branches or pull requests

2 participants