We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Maico Timmerman - From Data Pipelines to a Data Platform: Embracing Monorepo Architecture
Learn how monorepo architecture can transform data pipelines into a scalable data platform, with insights on tooling, automation, team autonomy, and managing large-scale systems.
-
Monorepo architecture helps improve cross-team collaboration and dependency management by keeping all code in one repository, making changes and testing more streamlined
-
Important to invest early in tooling and automation for monorepos to handle scale:
- Test automation
- Dependency management
- Build and deployment pipelines
- Access controls
- Code review workflows
-
Two key rules for monorepo organization:
- Applications should be independent leaf nodes
- No circular dependencies between packages/libraries
-
Scale metrics of their monorepo:
- 850 pipelines
- 2500 materialized tables
- 500 nodes with 25,000 CPUs
- 240 TB memory
- 23 PB storage
-
Benefits of monorepo for data platforms:
- Easier data validation across teams
- Simplified dependency management
- Better visibility into data lineage
- Faster rollout of changes
- Improved testing capabilities
-
Challenges to address:
- CI/CD complexity
- Test suite management at scale
- Access controls for sensitive data
- Dependency hell prevention
- Maintaining high speed of execution
-
Teams maintain autonomy within their own applications while benefiting from shared libraries and tooling
-
Critical to establish conventions and golden paths for teams to follow while allowing experimentation within bounds
-
Security and compliance become easier to manage with centralized access controls and visibility
-
Migration and large-scale changes become more manageable through automated tooling and testing