Skip to content

This repository contains Common Lisp code for exporting sample data from a MariaDB database and storing it in an LMDB database. The data to be exported is identified using RDF and will later be used in fetched directly from GN3.

License

Notifications You must be signed in to change notification settings

BonfaceKilz/gn-dataset-dump

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repository contains Common Lisp code for exporting sample data and storing it in an LMDB database. The data to be exported can be identified using RDF and will later be used directly from GN3. We provide a basic api for reading this data into a multi-dimensional array and storing it into lmdb as json.

Features:

  • Inbuilt Versioning
  • Metadata storage
  • Garbage collection for any unused data

HOWTO

Sample Data represents data from an experiment. Here’s how that would look like in from of a CSV file:

Strain Name,Value,SE,Count,Sex
BXD1,18,x,0
BXD12,16,x,x
BXD14,15,x,x
BXD15,14,x,x

In essence, the above data could be represented as a matrix with the headers being extra metadata that could be related to the above data. This data could be represented as a vector of vectors which would look like this:

(("#BXD1"  18 "x" 0)
 ("#BXD12" 16 "x" "x")
 ("#BXD14" 15 "x" "x")
 ("#BXD15" 14 "x" "x"))

And the metadata associated with it would be:

(("header" . #("Strain Name" "Value" "SE" "Count" "Sex")))

Importing data into a database

To enter data into a database, use “import-into-sampledata-db-data”. Here’s an example of how to do that:

(let ((data (make-sampledata
	     :matrix
	     (make-array
	      '(4 4)
	      :initial-contents
	      '(("#BXD1" 18 "x" 0)
		("#BXD12" 16 "x" "x")
		("#BXD14" 15 "x" "x")
		("#BXD15" 14 "x" "x")))
	     :metadata
	     '(("header" . #("Strain Name" "Value" "SE" "Count" "Sex"))))))
  (import-into-sampledata-db data "/tmp/BXD/10007/"))

To read the data back into a matrix object:

;; Retrieving the current matrix
(with-sampledata-db (db "/tmp/BXD/10007/" :write t)
  (sampledata-db-current-matrix db))

which outputs the following struct:

#S(SAMPLEDATA-DB-MATRIX
   :DB #<DB NIL {1004DA5B63}>
   :HASH NIL
   :NROWS 4
   :NCOLS 4
   :ROW-POINTERS NIL
   :COLUMN-POINTERS NIL
   :ARRAY #2A(("#BXD1" 18 "x" 0)
              ("#BXD12" 16 "x" "x")
              ("#BXD14" 15 "x" "x")
              ("#BXD15" 14 "x" "x"))
   :TRANSPOSE #2A(("#BXD1" "#BXD12" "#BXD14" "#BXD15")
                  (18 16 15 14)
                  ("x" "x" "x" "x")
                  (0 "x" "x" "x")))

To obtain the data:

(with-sampledata-db (db "/tmp/BXD/10007/" :write t)
  (sampledata-db-matrix-array
   (sampledata-db-current-matrix db)))

To print information about the database, use the following print-sampledata-db-info:

(print-sampledata-db-info "/tmp/BXD/10007/")

which would output something like:

Path: /tmp/BXD/10007/
Versions: 4
Keys: 26

Version 1
Dimensions: 4 x 4
Version 2
Dimensions: 2 x 4
Version 3
Dimensions: 2 x 4
Version 4
Dimensions: 2 x 4	

Hacking

Drop into a development environment with:

guix shell -m manifest.scm

To build the dump binary, run:

sbcl --load build.lisp

Usage

Dump sampledata from /tmp/dataset-dump/BXDPublish to the base directory: $HOME/sampledata-lmdb:

./dump import /tmp/dataset-dump/BXDPublish $HOME/sampledata-lmdb/

If $HOME/sampledata-lmdb/ does not exist, it will be created on the fly.

To print info, say for a the matrix database $HOME/sampledata-lmdb/10007/:

./dump info $HOME/sampledata-lmdb/10007/

  • [X] Dump actual data
  • [X] Make this accessible from the CLI
  • [X] Read access from Python
  • [ ] Previous version access

About

This repository contains Common Lisp code for exporting sample data from a MariaDB database and storing it in an LMDB database. The data to be exported is identified using RDF and will later be used in fetched directly from GN3.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published