Skip to content

Commit

Permalink
Update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
folinimarc committed Nov 15, 2023
1 parent 3668303 commit 6eb5633
Showing 1 changed file with 11 additions and 26 deletions.
37 changes: 11 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,16 @@
# Project "Pretty Panda"

## Goal
This repository is a collection of independent loosely coupled components to enable developers to efficiently prototype
geospatial processing locally. Once the processing steps to create a new dataset work
in the local environment, it should be straight-forward to bring them in a
production environment for periodic batch execution.
This repository is a playground to develop a collection of loosely coupled components
which...
1. enable developers to efficiently perform geospatial processing locally through a containerized environment.
2. demonstrate how to interact with large amounts of static geospatial data in low cost blob storage.
3. provide a framework to enable periodic batch processing of geospatial workflows.

## Design requirements
- Allow flexible and rapid local prototyping involving spatial processing.
- Store third party datasets and processing results in an organized manner.
- Allow for periodic batch re-processing of the data.
- Leverage Google Cloud infrastructure if possible.
- Low cost as a priority requirement.
## Case study
Open data from Switzerland will be used for demonstration purposes.

## Components
### Data store
We leverage Cloud Storage as cost-efficient storage for geospatial data. We leverage cloud-native data formats where possible.

### Processing environment
Various processing capabilities are bundled using container technology. This enables consistency across development and production workflows.
Priotity is given to the Python Geo-Ecosystem, followed by pyQGIS and R capabilities.

### Local development
We leverage Jupyterlab as a UI for development due to its integrated visualization capabilities and extensive plugin system.

### Batch processing
We leverage Google Batch to provide the infrastructure for batch runs.

### Orchestration
Use Cloud Scheduler to trigger the batch jobs.
## Design guidelines
- Optimize for cost efficiency for usecases where small to medium sized static, read-only datasets are produced through geospatial workflows through a low frequency batch process.
- Leverage Google Cloud infrastructure where possible.
- Allow multiple developers to efficiently perform exploratory analysis on stored datasets.

0 comments on commit 6eb5633

Please sign in to comment.