Testing pphe-data-pro monorepo
| data-transform | ||
| README.md | ||
test-pphe-data-pro
Testing pphe-data-pro monorepo
Proposed monorepo structure
The proposed structure is built with individual data pipelines in mind.
pphe-data-pro(repo root)data-extract-load/pipeline-{source-name}/common/
data-transform/dataform-project-{project-name}/
infra/terraform/ci-cd/
README.md.gitignore
EL Pipelines
The main idea is that each data pipeline should be fully independent from the others, except when shared (common) resources are used. Any GCP admin should be able to reconstruct one of our data pipelines end-to-end using Terraform alone. Consider the following EL pipeline directory structure for a single source system:
pipeline-{source-name}/(pipeline dir root)terraform/(all infra for the pipeline gets built here)main.tfvariables.tfoutputs.tf
cloud-run-service-{service-name}/(if needed, a cloud run service goes here)src/Dockerfilerequirements.txt
cloud-run-function-{function-name}/(if needed, a cloud run function goes here)main.pyrequirements.txt
README.md(pipeline documentation goes here)
Transform Pipelines
Transformation pipelines combine data from multiple data sources and therefore should be pooled within a single directory. This directory houses a dataform project.
Shared Infrastructure
The infra directory houses all shared infrastructure, including Terraform scripts as well as CI/CD pipelines.