Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Cell Datasets: reduce the size of the data sets and speed-up loading #208

Open
BlazZupan opened this issue Jun 3, 2018 · 4 comments

Comments

@BlazZupan
Copy link
Contributor

This issue is related to PR biolab/orange3#3047, which enables saving and loading of compressed pickle files. Once this is merged into Orange and released, I propose to:

  • move all the raw files to file.biolab.si/datasets/sc and there provide a compressed pickled versions
  • update info files for Single Cell Datasets accordingly

This should substantially reduce the transfer and loading time of data sets. For instance, the largest data set currently included (bone marrow with AML) has 64MB, while its pickled xz variant has on 2.4MB.

This update will create an issue with backward compatibility, which will be broken.

@BlazZupan BlazZupan changed the title Single Cell Datasets: reduce the size of the data sets Single Cell Datasets: reduce the size of the data sets and speed-up loading Jun 4, 2018
@astaric
Copy link
Member

astaric commented Jun 4, 2018

If we will migrate to .pickle.xz, I suggest we create a new "repository on serverfiles" and migrate to that. Otherwise, Datasets will start crashing on old(er) versions of the software.

@anupparikh
Copy link
Collaborator

We're creating another file format to support? why not standardize on tab and loom?

tab, mtx, loom, pickle...

@BlazZupan
Copy link
Contributor Author

@anupparikh, it's not a new format, it is just the way of storing tab (csv) files. A quick trick to substantially reduce the size of our demo datasets and speed-up the loading.

@anupparikh
Copy link
Collaborator

anupparikh commented Jun 4, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants