update readme
This commit is contained in:
parent
cd5d8ed624
commit
0cd10c2c57
44
README.md
44
README.md
@ -1,3 +1,45 @@
|
|||||||
# test-pphe-data-pro
|
# test-pphe-data-pro
|
||||||
|
|
||||||
Testing pphe-data-pro monorepo
|
Testing pphe-data-pro monorepo
|
||||||
|
|
||||||
|
## Proposed monorepo structure
|
||||||
|
|
||||||
|
The proposed structure is built with individual data pipelines in mind.
|
||||||
|
|
||||||
|
* `pphe-data-pro` (repo root)
|
||||||
|
* `data-extract-load/`
|
||||||
|
* `pipeline-{source-name}/`
|
||||||
|
* `common/`
|
||||||
|
* `data-transform/`
|
||||||
|
* `dataform-project-{project-name}/`
|
||||||
|
* `infra/`
|
||||||
|
* `terraform/`
|
||||||
|
* `ci-cd/`
|
||||||
|
* `README.md`
|
||||||
|
* `.gitignore`
|
||||||
|
|
||||||
|
### EL Pipelines
|
||||||
|
|
||||||
|
The main idea is that each data pipeline should be fully independent from the others, except when shared (common) resources are used. Any GCP admin should be able to reconstruct one of our data pipelines end-to-end using Terraform alone. Consider the following EL pipeline directory structure for a single source system:
|
||||||
|
|
||||||
|
* `pipeline-{source-name}/` (pipeline dir root)
|
||||||
|
* `terraform/` (all infra for the pipeline gets built here)
|
||||||
|
* `main.tf`
|
||||||
|
* `variables.tf`
|
||||||
|
* `outputs.tf`
|
||||||
|
* `cloud-run-service-{service-name}/` (if needed, a cloud run service goes here)
|
||||||
|
* `src/`
|
||||||
|
* `Dockerfile`
|
||||||
|
* `requirements.txt`
|
||||||
|
* `cloud-run-function-{function-name}/` (if needed, a cloud run function goes here)
|
||||||
|
* `main.py`
|
||||||
|
* `requirements.txt`
|
||||||
|
* `README.md` (pipeline documentation goes here)
|
||||||
|
|
||||||
|
### Transform Pipelines
|
||||||
|
|
||||||
|
Transformation pipelines combine data from multiple data sources and therefore should be pooled within a single directory. This directory houses a dataform project.
|
||||||
|
|
||||||
|
### Shared Infrastructure
|
||||||
|
|
||||||
|
The `infra` directory houses all shared infrastructure, including Terraform scripts as well as CI/CD pipelines.
|
||||||
Loading…
Reference in New Issue
Block a user