From 6eb56333a10062192891a7f3d812c86d305963dc Mon Sep 17 00:00:00 2001 From: laiskasiili Date: Wed, 15 Nov 2023 15:02:32 +0000 Subject: [PATCH] Update readme --- README.md | 37 +++++++++++-------------------------- 1 file changed, 11 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index 34d5ccd..978179f 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,16 @@ # Project "Pretty Panda" ## Goal -This repository is a collection of independent loosely coupled components to enable developers to efficiently prototype -geospatial processing locally. Once the processing steps to create a new dataset work -in the local environment, it should be straight-forward to bring them in a -production environment for periodic batch execution. +This repository is a playground to develop a collection of loosely coupled components +which... +1. enable developers to efficiently perform geospatial processing locally through a containerized environment. +2. demonstrate how to interact with large amounts of static geospatial data in low cost blob storage. +3. provide a framework to enable periodic batch processing of geospatial workflows. -## Design requirements -- Allow flexible and rapid local prototyping involving spatial processing. -- Store third party datasets and processing results in an organized manner. -- Allow for periodic batch re-processing of the data. -- Leverage Google Cloud infrastructure if possible. -- Low cost as a priority requirement. +## Case study +Open data from Switzerland will be used for demonstration purposes. -## Components -### Data store -We leverage Cloud Storage as cost-efficient storage for geospatial data. We leverage cloud-native data formats where possible. - -### Processing environment -Various processing capabilities are bundled using container technology. This enables consistency across development and production workflows. -Priotity is given to the Python Geo-Ecosystem, followed by pyQGIS and R capabilities. - -### Local development -We leverage Jupyterlab as a UI for development due to its integrated visualization capabilities and extensive plugin system. - -### Batch processing -We leverage Google Batch to provide the infrastructure for batch runs. - -### Orchestration -Use Cloud Scheduler to trigger the batch jobs. +## Design guidelines +- Optimize for cost efficiency for usecases where small to medium sized static, read-only datasets are produced through geospatial workflows through a low frequency batch process. +- Leverage Google Cloud infrastructure where possible. +- Allow multiple developers to efficiently perform exploratory analysis on stored datasets.