We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Continuous Delivery for Data • Dave Farley • YOW! 2020
Learn how to apply Continuous Delivery principles to data changes, with proven patterns for safe schema migrations, version control, testing, and managing ML models in production.
-
Continuous Delivery should be applied to data changes as well as code changes - the ability to deploy schema changes safely is crucial for true CD
-
Three main categories of data to consider:
- Transactional data (generated during system operation)
- Reference/lookup data (static, read-only)
- Configuration data (defines system behavior)
-
Key data migration patterns:
- Deployment time migration (simple but requires downtime)
- Lazy reader (translates on read, good for hot deployments)
- Lazy migrator (background migration during idle time)
-
Version control everything:
- Schema changes
- Migration scripts
- Data models
- Configuration
- Infrastructure code
-
Best practices for schema changes:
- Make additive changes when possible
- Version schemas with sequential numbers
- Keep schema version info with application code
- Write and test both upgrade and rollback scripts
- Test migrations thoroughly
-
Data testing approaches:
- Generate synthetic test data in test scope
- Avoid using production data for tests
- Focus on testing migration logic, not just final state
- Include migration tests in CI pipeline
-
For machine learning systems:
- Version control training data and models
- Create deployment pipelines for ML models
- Monitor model performance in production
- Enable A/B testing of models
- Plan for model updates and retraining
-
Make systems deterministic and repeatable:
- Use infrastructure as code
- Automate environment setup
- Version all dependencies together
- Enable rolling back to previous states
-
Design systems to handle evolution:
- Allow structure to change over time
- Plan for data migration needs upfront
- Keep old data versions readable
- Build migration capabilities into applications
-
Focus on fast feedback loops:
- Automate testing and deployment
- Make changes in small increments
- Validate changes early
- Monitor results in production