-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polars 1xx #149
base: main
Are you sure you want to change the base?
Polars 1xx #149
Commits on Jun 10, 2024
-
Format all python files with "ruff format".
Format all python files with "ruff format".
Configuration menu - View commit details
-
Copy full SHA for ef010bc - Browse repository at this point
Copy the full SHA ef010bcView commit details
Commits on Jun 11, 2024
-
Cleanup code of get_barcodes_passing_qc_for_sample.
Cleanup code of get_barcodes_passing_qc_for_sample.
Configuration menu - View commit details
-
Copy full SHA for 9bb9fda - Browse repository at this point
Copy the full SHA 9bb9fdaView commit details
Commits on Jul 10, 2024
-
Update Polars syntax to 1.0.0+ version.
Update Polars syntax to 1.0.0+ version and fix some type checking.
Configuration menu - View commit details
-
Copy full SHA for 0f73b45 - Browse repository at this point
Copy the full SHA 0f73b45View commit details -
Skip empty lines when reading barcode file in "read_barcodes_file_to_…
…polars_series". Skip empty lines when reading barcode file in "read_barcodes_file_to_polars_series".
Configuration menu - View commit details
-
Copy full SHA for 2915437 - Browse repository at this point
Copy the full SHA 2915437View commit details -
Add workaround for very rare cases in QC where KDE for duplication ra…
…tio can not be calculated. Add workaround for very rare cases in QC where KDE for duplication ratio can not be calculated as all duplication ratio values are 0.0% as the fragment counts for each fragment were 1.
Configuration menu - View commit details
-
Copy full SHA for 7d9a9cc - Browse repository at this point
Copy the full SHA 7d9a9ccView commit details -
Add mypy stub generation and mypy numpy typing plugin.
Add mypy stub generation and mypy numpy typing plugin. Later stubs can be generated with: stubgen -o stubs -p pycisTopic
Configuration menu - View commit details
-
Copy full SHA for 35e13ea - Browse repository at this point
Copy the full SHA 35e13eaView commit details
Commits on Jul 15, 2024
-
Only keep one copy of each barcode when reading barcode file in "read…
…_barcodes_file_to_polars_series". Only keep one copy of each barcode when reading barcode file in "read_barcodes_file_to_polars_series".
Configuration menu - View commit details
-
Copy full SHA for 14fce9a - Browse repository at this point
Copy the full SHA 14fce9aView commit details -
Change df.columns with df.collect_schema().names() so it will work on…
… lazy Polars dataframes. Change df.columns with df.collect_schema().names() so it will work on lazy Polars dataframes.
Configuration menu - View commit details
-
Copy full SHA for 39ace8a - Browse repository at this point
Copy the full SHA 39ace8aView commit details -
Update "write_csv" usage to Polars syntax 1.0.0+ by using "include_he…
…ader" instead of "has_header". Update "write_csv" usage to Polars syntax 1.0.0+ by using "include_header" instead of "has_header".
Configuration menu - View commit details
-
Copy full SHA for 9892252 - Browse repository at this point
Copy the full SHA 9892252View commit details -
Do not use "by=" keyword in group_by as Polars 1.0.0+ treats that as …
…renaming the grouped column to "by". Do not use "by=" keyword in group_by as Polars 1.0.0+ treats that as renaming the grouped column to "by". Change "sort" and "partition_by" too, to not use the "by" keyword, although those cases still work as before.
Configuration menu - View commit details
-
Copy full SHA for ca9b230 - Browse repository at this point
Copy the full SHA ca9b230View commit details
Commits on Jul 16, 2024
-
Reformat TSS profile chained code by adding no opt "clone" calls.
Reformat TSS profile chained code by adding no opt "clone" calls.
Configuration menu - View commit details
-
Copy full SHA for 861f9b7 - Browse repository at this point
Copy the full SHA 861f9b7View commit details -
Restrict version numbers for some depencencies.
Restrict version numbers for some depencencies.
Configuration menu - View commit details
-
Copy full SHA for 1e2e0bf - Browse repository at this point
Copy the full SHA 1e2e0bfView commit details -
Expose "engine" option in "pycistopic qc".
Expose "engine" option in "pycistopic qc".
Configuration menu - View commit details
-
Copy full SHA for bfa6e7b - Browse repository at this point
Copy the full SHA bfa6e7bView commit details -
Support adding sample ID to cell barcodes when reading fragments or c…
…ell barcode files. Support adding sample ID to cell barcodes when reading fragments or cell barcode files by adding extra arguments to: - `read_fragments_to_polars_df` - `read_barcodes_file_to_polars_series`
Configuration menu - View commit details
-
Copy full SHA for deef721 - Browse repository at this point
Copy the full SHA deef721View commit details -
Remove unused argument from
get_insert_size_distribution
docstring.Remove unused argument from `get_insert_size_distribution` docstring.
Configuration menu - View commit details
-
Copy full SHA for 49200f5 - Browse repository at this point
Copy the full SHA 49200f5View commit details -
Use greater than or equal for threshold filters in "get_barcodes_pass…
…ing_qc_for_sample". Use greater than or equal for threshold filters in "get_barcodes_passing_qc_for_sample".
Configuration menu - View commit details
-
Copy full SHA for 15899de - Browse repository at this point
Copy the full SHA 15899deView commit details -
Change
pycistopic qc
topycistopic qc run
.Change `pycistopic qc` to `pycistopic qc run`.
Configuration menu - View commit details
-
Copy full SHA for 419a536 - Browse repository at this point
Copy the full SHA 419a536View commit details -
Add
pycistopic qc filter
to be able to filter cell barcodes based o……n QC stats. Add `pycistopic qc filter` to be able to filter cell barcodes based on QC stats.
Configuration menu - View commit details
-
Copy full SHA for f9d78a1 - Browse repository at this point
Copy the full SHA f9d78a1View commit details
Commits on Jul 17, 2024
-
Fix "get_tss_profile" so it works both with "polars" and "polars-u64-…
…idx". Fix "get_tss_profile" so it works both with "polars" and "polars-u64-idx". Before it was not setting None values to 0.0 when run with "polars-u64-idx".
Configuration menu - View commit details
-
Copy full SHA for 4661c7d - Browse repository at this point
Copy the full SHA 4661c7dView commit details -
Fix some columns to pl.UInt32 so dataframes have same schema both wit…
…h "polars" and "polars-u64-idx". Fix some columns to pl.UInt32 so dataframes have same schema both with "polars" and "polars-u64-idx".
Configuration menu - View commit details
-
Copy full SHA for 7b285a4 - Browse repository at this point
Copy the full SHA 7b285a4View commit details
Commits on Jul 18, 2024
-
Add "create_fragment_matrix_from_fragments" to create directly a spar…
…se fragment matrix. Add "create_fragment_matrix_from_fragments" to create directly a sparse fragment matrix from a fragments file for consensus peaks of interest. The new code uses a lot less memory as it never builds a full dense matrix. import pycisTopic.cistopic_class import pycisTopic.fragments # Create fragments matrix for fragment file for consensus regions and # for cell barcodes selected after pycistopic QC. counts_fragments_matrix, cbs, region_ids = pycisTopic.fragments.create_fragment_matrix_from_fragments( fragments_bed_filename="fragments_GSM7822226_MM_566.tsv.gz", regions_bed_filename="consensus_regions.bed", barcodes_tsv_filename="cbs_after_pycistopic_qc.tsv", blacklist_bed_filename="hg38-blacklist.v2.bed", # Or None ) # Define sample ID (project name). sample_id = "Sample1" # Create cisTopic object from sparse fragment matrix. cistopic_obj = pycisTopic.cistopic_class.create_cistopic_object( fragment_matrix=counts_fragments_matrix, cell_names=cbs, region_names=region_ids, path_to_fragments={sample_id: "fragments.tsv.gz"}, project=sample_id )
Configuration menu - View commit details
-
Copy full SHA for a40e47c - Browse repository at this point
Copy the full SHA a40e47cView commit details
Commits on Aug 2, 2024
-
Add "create_regions_topics_frequency_matrix" function to replace "loa…
…d_word_topics" in the future. Add "create_regions_topics_frequency_matrix" function, which uses Mallet region topics counts file as input, to replace "load_word_topics" and "get_topics" in the future. "load_word_topics" uses Mallet state file, which can be huge if Mallet is run with a lot of regions and cells as input. Mallet state files are quite big (even several hunderds of GB) and take quite a bit of time to be written to disk when Mallet is run with a lot of regions and cells. Getting count values for each region-topic pair is quite memory intensive in the current code too. Mallet region topics counts files are much smaller and don't require much post processing and time to be written to disk.
Configuration menu - View commit details
-
Copy full SHA for b074950 - Browse repository at this point
Copy the full SHA b074950View commit details
Commits on Aug 5, 2024
-
Rename "create_regions_topics_frequency_matrix" to "create_regions_to…
…pics_count_matrix". Rename "create_regions_topics_frequency_matrix" to "create_regions_topics_count_matrix" and keep the code for creating the frequency matrix in "get_topics" as both matrices are needed.
Configuration menu - View commit details
-
Copy full SHA for fab1300 - Browse repository at this point
Copy the full SHA fab1300View commit details -
Allow creating of mallet serialized corpus file directly from sparse …
…matrix. Allow creating of mallet serialized corpus file directly from sparse matrix. This roughly replaces LDAMallet.convert_corpus_to_mallet_corpus_file and allows this new method to be used independently of the Mallet topic modeling itself. As generating the Mallet serialized corpus only needs to be done once before all topic modelings, being able to run it independently, is a huge plus.
Configuration menu - View commit details
-
Copy full SHA for ad9c756 - Browse repository at this point
Copy the full SHA ad9c756View commit details -
Expose creation of Mallet corpus file from pycistopic CLI interface.
Expose creation of Mallet corpus file from pycistopic CLI interface: pycistopic topic_modeling create_mallet_corpus Usage: Create binary accessibility matrix in Matrix Market format: import pycisTopic.fragments import scipy counts_fragments_matrix, cbs, region_ids = pycisTopic.fragments.create_fragment_matrix_from_fragments( "fragments.tsv.gz", "consensus_regions.bed", "cbs.tsv" ) # Create binary matrix: binary_matrix = counts_fragments_matrix.copy() binary_matrix.data.fill(1) # Write binary matrix in Matrix Market format. scipy.io.mmwrite("binary_accessibility.mtx", binary_matrix) Create Mallet corpus file from binary accessibility matrix in Matrix Market format: $ pycistopic topic_modeling create_mallet_corpus -i "binary_accessibility.mtx" -o "corpus.mallet"
Configuration menu - View commit details
-
Copy full SHA for 2d54473 - Browse repository at this point
Copy the full SHA 2d54473View commit details
Commits on Aug 22, 2024
-
Add some basic sanity checking to "convert_binary_matrix_to_mallet_co…
…rpus_file". Add some basic sanity checking to "convert_binary_matrix_to_mallet_corpus_file".
Configuration menu - View commit details
-
Copy full SHA for b680420 - Browse repository at this point
Copy the full SHA b680420View commit details
Commits on Aug 26, 2024
-
Pass correct alpha and eta to loglikelihood function.
Pass correct alpha and eta to loglikelihood function so it is the same than the one that was actually used during topic modeling.
Configuration menu - View commit details
-
Copy full SHA for 646db07 - Browse repository at this point
Copy the full SHA 646db07View commit details -
Remove Mallet text corpus after conversion to serialised corpus file.
Remove Mallet text corpus after conversion to serialised corpus file.
Configuration menu - View commit details
-
Copy full SHA for a59ae9e - Browse repository at this point
Copy the full SHA a59ae9eView commit details -
Add LDAMalletFilenames class to automatically generate filenames which will be used later in new LDAMallet class.
Configuration menu - View commit details
-
Copy full SHA for b6d0973 - Browse repository at this point
Copy the full SHA b6d0973View commit details -
Use --word-topic-counts-file instead of --output-state file when runn…
…ing Mallet. Use --word-topic-counts-file instead of --output-state file when running Mallet. The first generates directly the counts we need, with minimal post processing. The state file also could take ages to write and read for datasets with a lot of regions and could increase the runtime of topic modeling by 24 hours. Changed `LDAMallet.train` to `LDMallet.run_mallet_topic_modeling` and made it a static method. Reading of the output produced by Mallet will be added in further commits.
Configuration menu - View commit details
-
Copy full SHA for 5d8cd37 - Browse repository at this point
Copy the full SHA 5d8cd37View commit details -
Add static methods to read cell-topic probabilities from Mallet "--ou…
…tput-doc-topics" output. Add static methods to read cell-topic probabilities from Mallet "--output-doc-topics" txt output to parquet file.
Configuration menu - View commit details
-
Copy full SHA for 12c7c7b - Browse repository at this point
Copy the full SHA 12c7c7bView commit details -
Add static methods to read region-topic counts and probabilities from…
… Mallet "--word-topic-counts-file" output. Add static methods to read cell-topic counts and probabilities from Mallet "--word-topic-counts-file" txt output to parquet file. - `LDAMallet.convert_region_topic_counts_txt_to_parquet()` replaces `LDAMallet.create_regions_topics_frequency_matrix()`, which recently replaced `LDAMallet.load_word_topics()` (read Mallet state file): See commit: b074950 Add "create_regions_topics_frequency_matrix" function to replace "load_word_topics" in the future. Add "create_regions_topics_frequency_matrix" function, which uses Mallet region topics counts file as input, to replace "load_word_topics" and "get_topics" in the future. "load_word_topics" uses Mallet state file, which can be huge if Mallet is run with a lot of regions and cells as input. Mallet state files are quite big (even several hunderds of GB) and take quite a bit of time to be written to disk when Mallet is run with a lot of regions and cells. Getting count values for each region-topic pair is quite memory intensive in the current code too. Mallet region topics counts files are much smaller and don't require much post processing and time to be written to disk. - `LDAMallet.read_region_topic_counts_parquet_file_to_region_topic_probabilities()` replaces the functionality of `LDAMallet.get_topics()`, but reads the original data directly from disk.
Configuration menu - View commit details
-
Copy full SHA for cbfe3bb - Browse repository at this point
Copy the full SHA cbfe3bbView commit details -
Add static method to read JSON parameter file written by `LDAMallet.r…
…un_mallet_topic_modeling`. Add static method to read JSON parameter file written by `LDAMallet.run_mallet_topic_modeling`.
Configuration menu - View commit details
-
Copy full SHA for 8dbe474 - Browse repository at this point
Copy the full SHA 8dbe474View commit details
Commits on Aug 27, 2024
-
Use new functions for reading Mallet "--word-topic-counts-file" output.
Use new functions for reading Mallet "--word-topic-counts-file" and "--output-doc-topics" output to parquet files and saving parameters JSON file.
Configuration menu - View commit details
-
Copy full SHA for 3bbc709 - Browse repository at this point
Copy the full SHA 3bbc709View commit details -
Remove deprecated code from LDAMallet class.
Remove deprecated code from LDAMallet class.
Configuration menu - View commit details
-
Copy full SHA for 52de7e1 - Browse repository at this point
Copy the full SHA 52de7e1View commit details -
Add/update some logging statements in LDAMallet class.
Add/update some logging statements in LDAMallet class.
Configuration menu - View commit details
-
Copy full SHA for 223981c - Browse repository at this point
Copy the full SHA 223981cView commit details -
Add "--verbose" parameter to
pycistopic topic_modeling
subcommands.Add "--verbose" parameter to `pycistopic topic_modeling` subcommands.
Configuration menu - View commit details
-
Copy full SHA for 14b6330 - Browse repository at this point
Copy the full SHA 14b6330View commit details -
Update
pycistopic topic_modeling mallet
CLI code to use the new LDA……Mallet class. Update `pycistopic topic_modeling mallet` CLI code to use the new LDAMallet class.
Configuration menu - View commit details
-
Copy full SHA for bb8aa47 - Browse repository at this point
Copy the full SHA bb8aa47View commit details -
Move "create_mallet_corpus" argument parsing code above "mallet" argu…
…ment parsing code. Move "create_mallet_corpus" argument parsing code above "mallet" argument parsing code, so when both get moved behind a new "mallet" subcommand, the diff will be less noisy.
Configuration menu - View commit details
-
Copy full SHA for 3daefd2 - Browse repository at this point
Copy the full SHA 3daefd2View commit details -
Create
pycistopic topic_modeling mallet
subparser and move Mallet r……elated subcommands under it. Create `pycistopic topic_modeling mallet` subparser and move Mallet related subcommands under it: - `pycistopic topic_modeling create_mallet_corpus` => `pycistopic topic_modeling mallet create_corpus` - `pycistopic topic_modeling mallet` => `pycistopic topic_modeling mallet run`
Configuration menu - View commit details
-
Copy full SHA for 64ea33f - Browse repository at this point
Copy the full SHA 64ea33fView commit details
Commits on Aug 28, 2024
-
Rework
run_cgs_model_mallet
tocalculate_model_evaluation_stats
b……ut start from precalculated Mallet output. Rework `run_cgs_model_mallet` to `calculate_model_evaluation_stats` but start from precalculated Mallet output. Removing running Mallet from this function allows easier parallelization of Mallet topic modeling.
Configuration menu - View commit details
-
Copy full SHA for b6e7006 - Browse repository at this point
Copy the full SHA b6e7006View commit details -
Add
pycistopic topic_modeling mallet stats
subcommand.Add `pycistopic topic_modeling mallet stats` subcommand, using `calculate_model_evaluation_stats`.
Configuration menu - View commit details
-
Copy full SHA for 801f334 - Browse repository at this point
Copy the full SHA 801f334View commit details -
Rename
binary_matrix
tobinary_accessibility_matrix
.Rename `binary_matrix` to `binary_accessibility_matrix`.
Configuration menu - View commit details
-
Copy full SHA for 8f2faef - Browse repository at this point
Copy the full SHA 8f2faefView commit details