Skip to content
/ cruxr Public

Search + Extract, Transform and Load of PDF, DOCX and PPTX files

Notifications You must be signed in to change notification settings

gsss124/cruxr

Repository files navigation

Generic Searchable Datawarehouse - Processing PDFs, PPTs, etc

Language - Python

Database - SQLite, PostgreSQL, etc

Warehouse File Types - PDF, PPTX (for now)

Updating to Django [Backend] and Bootstrap [Frontend] - stay tuned!

-- NOTES --

Pipenv virtual environment is recommended to be used

Install PDF Miner on your Python installation, follow this guide -> http://www.unixuser.org/~euske/python/pdfminer/#install

Install python-pptx on your Python installation, follow this guide -> https://python-pptx.readthedocs.io/en/latest/user/install.html#install

Django uses SQLite database by default, edit settings.py to configure your Database of choice!

About

Search + Extract, Transform and Load of PDF, DOCX and PPTX files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published