Testing pphe-data-pro monorepo
Go to file
2025-12-03 14:02:40 +01:00
data-transform add junk file 2025-12-03 14:02:40 +01:00
README.md update readme 2025-11-18 16:45:28 +01:00

test-pphe-data-pro

Testing pphe-data-pro monorepo

Proposed monorepo structure

The proposed structure is built with individual data pipelines in mind.

  • pphe-data-pro (repo root)
    • data-extract-load/
      • pipeline-{source-name}/
      • common/
    • data-transform/
      • dataform-project-{project-name}/
    • infra/
      • terraform/
      • ci-cd/
    • README.md
    • .gitignore

EL Pipelines

The main idea is that each data pipeline should be fully independent from the others, except when shared (common) resources are used. Any GCP admin should be able to reconstruct one of our data pipelines end-to-end using Terraform alone. Consider the following EL pipeline directory structure for a single source system:

  • pipeline-{source-name}/ (pipeline dir root)
    • terraform/ (all infra for the pipeline gets built here)
      • main.tf
      • variables.tf
      • outputs.tf
    • cloud-run-service-{service-name}/ (if needed, a cloud run service goes here)
      • src/
      • Dockerfile
      • requirements.txt
    • cloud-run-function-{function-name}/ (if needed, a cloud run function goes here)
      • main.py
      • requirements.txt
    • README.md (pipeline documentation goes here)

Transform Pipelines

Transformation pipelines combine data from multiple data sources and therefore should be pooled within a single directory. This directory houses a dataform project.

Shared Infrastructure

The infra directory houses all shared infrastructure, including Terraform scripts as well as CI/CD pipelines.