Continuous Delivery for Data • Dave Farley • YOW! 2020

Testing Automation Devops

Learn how to apply Continuous Delivery principles to data changes, with proven patterns for safe schema migrations, version control, testing, and managing ML models in production.

Key takeaways

Continuous Delivery should be applied to data changes as well as code changes - the ability to deploy schema changes safely is crucial for true CD
Three main categories of data to consider:
- Transactional data (generated during system operation)
- Reference/lookup data (static, read-only)
- Configuration data (defines system behavior)
Key data migration patterns:
- Deployment time migration (simple but requires downtime)
- Lazy reader (translates on read, good for hot deployments)
- Lazy migrator (background migration during idle time)
Version control everything:
- Schema changes
- Migration scripts
- Data models
- Configuration
- Infrastructure code
Best practices for schema changes:
- Make additive changes when possible
- Version schemas with sequential numbers
- Keep schema version info with application code
- Write and test both upgrade and rollback scripts
- Test migrations thoroughly
Data testing approaches:
- Generate synthetic test data in test scope
- Avoid using production data for tests
- Focus on testing migration logic, not just final state
- Include migration tests in CI pipeline
For machine learning systems:
- Version control training data and models
- Create deployment pipelines for ML models
- Monitor model performance in production
- Enable A/B testing of models
- Plan for model updates and retraining
Make systems deterministic and repeatable:
- Use infrastructure as code
- Automate environment setup
- Version all dependencies together
- Enable rolling back to previous states
Design systems to handle evolution:
- Allow structure to change over time
- Plan for data migration needs upfront
- Keep old data versions readable
- Build migration capabilities into applications
Focus on fast feedback loops:
- Automate testing and deployment
- Make changes in small increments
- Validate changes early
- Monitor results in production

Continuous Delivery for Data • Dave Farley • YOW! 2020

More talks