GTAC 2016: How Flaky Tests in Continuous Integration

Learn how Google's continuous integration system handles flaky tests, a common problem that affects 16% of all tests, and discover strategies for identifying and fixing flaky tests to improve test reliability.

Key takeaways
  • Flaky tests are a common problem in continuous integration, with around 16% of all tests at Google being flaky.
  • Flaky tests can be caused by various factors such as code changes, resource contention, and concurrency issues.
  • A flaky test is defined as a test that fails intermittently, but not consistently.
  • Flaky tests can be added to a test database, along with their history, to track and analyze their behavior.
  • By analyzing the history of flaky tests, it is possible to identify patterns and correlations that can inform the development of better tests.
  • Flaks (flaky tests) are more likely to occur in tests that are run frequently, and in tests that are complex or have many dependencies.
  • A UI test with Selenium WebDriver may be more likely to be flaky than a unit test.
  • Integration tests and web tests tend to be more flaky than smaller, more isolated tests.
  • Code modification frequency is a good signal for predicting which test targets are likely to be flaky.
  • File modification by a single author is less likely to cause flakiness than modification by multiple authors.
  • Code review is important for identifying and fixing flaky tests.
  • Continuous integration systems have to deal with a certain level of flakiness in tests, and Google’s CI system handles this by running tests in parallel and rescheduling them until they pass or fail.
  • Google has a large team of developers and many small teams, which makes it easier to manage flaky tests.
  • Google’s CI system has a high volume of tests and a fast turnaround time, which makes it critical to prioritize and improve test reliability.
  • Google’s process is to monitor test results, identify flaky tests, and prioritize their improvement.
  • Google prioritizes fixing flaky tests over debugging code changes.
  • Google’s CI system has a high level of autonomy, which allows developers to decide how to fix flaky tests.
  • Google uses machine learning to predict which tests are likely to be flaky and to detect real failures.
  • Google publishes its data and methodology for detecting flaky tests.
  • Google is open to collaboration and willing to share its data and insights with other companies.
  • Google believes that testing is important for ensuring the quality of its software.