The National Data Platform (NDP) is a federated and extensible data ecosystem to promote collaboration, innovation and equitable use of data on top of existing cyberinfrastructure capabilities.
The National Data Platform was funded by NSF 2333609 under CI, CISE Research Resources programs. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funders.
NDP uses git submodules so cloning must be done through the following command:
git clone --recurse-submodules [email protected]:national-data-platform/ndp.git
In the case the submodules are not present you can also init them with the following command:
git submodule update --init
NDP consist of the following core services:
- CKAN and CKAN2: two identical, but separate and isolated data catalog services
- NDP JupyterHub: notebook service running on NRP k8s cluster (Nautilus)
- MLflow: AI/ML experimentation tracker service
- NDP Frontend: frontend service
- NDP API: backend service containing endpoints for managing workspaces, token and other endpoints for future functionalities
- Keycloak: authentication and authorization
The following git submodules are used:
Other dependencies:
- CKAN NDP Plugin through ndp plugin (note: this might be deprecated / not needed)
- CKAN / CKAN2 Keycloak Plugin through ndp ckan keycloak plugin (note: this might be deprecated / not needed)
- CKAN2 Catalog Additions Plugin through ndp catalog additions plugin
- Mlflow Keycloak Plugin through ndp auth plugin
- Python scripts to pre-load ckan and setup the ckan harvester ckan-data-loader
- The Dockerspawer version of Jupyterhub provided in this repo is used as a starting point/inspiration (local).
- NDP Jupyterhub Helm chart deployment and image files are located in this repo
With the git submodules we can use docker compose extension functionality to easily use the docker-compose manifest file from each repo and make necessary changes in the current docker-compose file.
First, make a copy of the .env.example
, .env_ckan2.example
, ckan.ini.example
, ckan2.ini.example
:
cp .env.example .env
cp .env_ckan2.example .env_ckan2
cp ckan.ini.example ckan.ini
cp ckan2.ini.example ckan2.ini
Note that will need modification of environment variables, depending on your environment setup, so please contact the NDP Admins.
- The development docker-compose file is
docker-compose.dev.yaml
. - The production docker-compose file is
docker-compose.prod.yaml
.
The following command builds and runs the containers:
make
Cleanup:
make clean
Cleanup and delete volumes:
make dist-clean
Useful commands examples directly via Docker CLI:
# Start all services
docker compose -f docker-compose.dev.yaml up --build -d
# Start one service
docker compose -f docker-compose.dev.yaml up frontend --build -d
# List containers
docker ps
# Watch logs
docker logs ndp-frontend-1 -f
# Remove one service
docker compose -f docker-compose.dev.yaml down frontend
# Remove one service
docker compose -f docker-compose.dev.yaml down frontend
# Remove all services
docker compose -f docker-compose.dev.yaml down
# Remove all services and volumes
docker compose -f docker-compose.dev.yaml down -v
# Remove all services and orphans even if they don't appear in the current file
docker compose -f docker-compose.dev.yaml down --remove-orphans
In development mode, ckan extensions can be downloaded to src_extensions and some commands are provided through the Makefile.
The rest of the instructions are meant to setup three ckan extensions; ckanext-ndp, ckanext-keycloak, and ckanext-ndpcatalogadditions
Install ndp ckan extension:
make download-ckanext-ndp
Next, append ndp
to the ckan plugin environment variable CKAN__PLUGINS
.
Update ckan config to add Jupyterhub endpoint:
make update-ckan-config
Note:
To avoid the issue from the remarks, we can mount ckan.ini file directly to the container. Therefore, the extension might be needed to be inserted under plugins section into
ckan.plugins
variable.
After this change you will need to restart ckan to pick up the new changes.
Install ndp ckan keycloak extension:
make download-ckanext-keycloak
Next, append keycloak
to the ckan plugin environment variable CKAN__PLUGINS
.
Note:
To avoid the issue from the remarks, we can mount ckan.ini file directly to the container. Therefore, the extension might be needed to be inserted under plugins section into
ckan.plugins
variable.
After this change you will need to restart ckan to pick up the new changes.
Install ckanext-ndpcatalogadditions:
make download-ckanext-ndpcatalogadditions
Next, append ndpcatalogadditions
to the ckan plugin environment variable CKAN__PLUGINS
in .env_ckan2
file.
Note:
To avoid the issue from the remarks, we can mount ckan2.ini file directly to the container. Therefore, the extension might be needed to be inserted under plugins section into
ckan.plugins
variable.
After this change you will need to restart ckan to pick up the new changes.
- There are cases when ckan doesn't pick up some environment variables from the .env file so sometimes its better to mount a copy of the ckan.ini file. There is a comment in the docker-compose file for this in this line. I have noticed that it sometimes doesn't pick up
CKAN___ROOT__PATH
which is needed for the nginx reverse proxy to work. - Same issue, as #1. There are cases when ckan2 doesn't pick up some environment variables from the .env_ckan2 file so sometimes its better to mount a copy of the ckan2.ini file.
The production docker-compose file is docker-compose.prod.yaml.
docker compose -f docker-compose.prod.yaml up --build -d
Look for additional helpful Docker CLI commands in the Development section.
- Both ckanext-ndp and ckanext-keycloak are built in the ckan production Dockerfile.
- ckanext-ndpcatalogadditions plugin is built in CKAN2 production Dockerfile2.
Some remarks in the development section also apply to production.
https://github.com/national-data-platform/ndp/tree/ndp-test-environment
For testing and development.
https://github.com/national-data-platform/ndp/tree/ndp-staging-environment
For testing and demos. Pre-production environment.
Production, user-facing environment.