[ENH] PCA transformation speedup #1539

markotoplak · 2016-09-02T14:00:13Z

Speedup of transformations into the PCA space on my laptop:

adult:
6.07s -> 0.21s
an infrared dataset with shape (16384, 1608):
(more than 40 minutes - cancelled) -> 6.69s

codecov-io · 2016-09-02T14:10:06Z

Current coverage is 88.30% (diff: 97.05%)

Merging #1539 into master will increase coverage by 0.02%

@@             master      #1539   diff @@
==========================================
  Files            77         77          
  Lines          7629       7654    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           6735       6759    +24   
- Misses          894        895     +1   
  Partials          0          0

Powered by Codecov. Last update ea0173f...935303a

kernc · 2016-09-03T21:21:06Z

Orange/data/util.py

@@ -28,3 +28,27 @@ def scale(values, min=0, max=1):
    if ptp == 0:
        return np.clip(values, min, max)
    return (-minval + values) / ptp * (max - min) + min
+
+
+class ComputeValueWCommon:


Does not belong here. Put it in pca.py, where it's used.

This will be used for other transformations. It belongs in some base orange classes as it is referenced in table.py.

I don't know. Will be used for other transformations as part of this PR? Utils are tools, like screwdrivers and wrenches. This doesn't look like a tool, rather a very specific piece of plug I, unrelatedly, happen to not appreciate at all. 😃

Not, not as a part of this PR, but I'll definitely use it a lot in an add-on. This thing tries to at least partially solve the problem where a lot of features are based on the same computation. This also frequently occurs in text mining - also @nikicc was questioning it once.

Or do you perhaps suggest to put this in table.py or variable.py?

kernc · 2016-09-03T21:32:51Z

While I enjoy the performance increase, I don't like how the change introduces considerable branching on the consumer end.

The key to performance is elegance, not battalions of special cases.
— Jon Bentley and Doug McIlroy

kernc · 2016-09-03T21:34:13Z

Orange/projection/pca.py

-    def __call__(self, data):
-        if data.domain != self.projection.pre_domain:
-            data = data.from_table(self.projection.pre_domain, data)
-        return self.projection.transform(data.X)[:, self.feature]


The result would be pretty much the same if you just somehow intelligently memoized this self.projection.transform() call?

This could ob course be memoized and shared between different compute_value functions for the same domain. But for lesser memory use that cache would have to be invalidated after transformation. Therefore, I did it so that the common part is exposed to the tranformer, which can destroy the intermediate results after transformation.

I don't currently have a better idea. 😕 I just ~~strongly feel~~know this isn't something a user of the projector should worry about.

My line of reasoning. I would like to clear the cache after transformation. The only thing that knows when transformation ends is the user. Therefore the user has to be aware of it.

An alternative solution would be to call something like clear_cache() for all the variables after transformation, but this has the same awareness-drawback while being less direct. Also, theoretically, such caches would have problems with concurrent transforms...

Speedup of transformations into the PCA space on my laptop: - adult: 6.07s -> 0.21s - an infrared dataset with shape (16384, 1608): (more than 40 minutes - cancelled) -> 6.69s

astaric · 2016-09-09T09:56:22Z

Orange/data/util.py

+
+    Parameters
+    ----------
+    compute_shared: a callable that takes a Orange.data.Table


Callable[[Orange.data.Table], object] PEP484

kernc reviewed Sep 3, 2016
View reviewed changes

markotoplak added 2 commits September 9, 2016 10:54

Support sharing computation between compute_value functions.

83a1129

PCA transformation speedup by sharing parts of the computation.

c7e5e5c

Speedup of transformations into the PCA space on my laptop: - adult: 6.07s -> 0.21s - an infrared dataset with shape (16384, 1608): (more than 40 minutes - cancelled) -> 6.69s

astaric reviewed Sep 9, 2016
View reviewed changes

SharedComputeValue: added tests.

935303a

astaric merged commit d464b58 into biolab:master Sep 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] PCA transformation speedup #1539

[ENH] PCA transformation speedup #1539

markotoplak commented Sep 2, 2016 •

edited

Loading

codecov-io commented Sep 2, 2016 •

edited

Loading

kernc Sep 3, 2016

markotoplak Sep 4, 2016

kernc Sep 5, 2016

markotoplak Sep 5, 2016 •

edited

Loading

kernc commented Sep 3, 2016

kernc Sep 3, 2016

markotoplak Sep 4, 2016 •

edited

Loading

kernc Sep 5, 2016 •

edited

Loading

markotoplak Sep 5, 2016

astaric Sep 9, 2016

[ENH] PCA transformation speedup #1539

[ENH] PCA transformation speedup #1539

Conversation

markotoplak commented Sep 2, 2016 • edited Loading

codecov-io commented Sep 2, 2016 • edited Loading

Current coverage is 88.30% (diff: 97.05%)

kernc Sep 3, 2016

Choose a reason for hiding this comment

markotoplak Sep 4, 2016

Choose a reason for hiding this comment

kernc Sep 5, 2016

Choose a reason for hiding this comment

markotoplak Sep 5, 2016 • edited Loading

Choose a reason for hiding this comment

kernc commented Sep 3, 2016

kernc Sep 3, 2016

Choose a reason for hiding this comment

markotoplak Sep 4, 2016 • edited Loading

Choose a reason for hiding this comment

kernc Sep 5, 2016 • edited Loading

Choose a reason for hiding this comment

markotoplak Sep 5, 2016

Choose a reason for hiding this comment

astaric Sep 9, 2016

Choose a reason for hiding this comment

markotoplak commented Sep 2, 2016 •

edited

Loading

codecov-io commented Sep 2, 2016 •

edited

Loading

markotoplak Sep 5, 2016 •

edited

Loading

markotoplak Sep 4, 2016 •

edited

Loading

kernc Sep 5, 2016 •

edited

Loading