ElixirConf 2023 - Andrew Bennett - Erlang Dist Filtering and the WhatsApp Runtime System

Expert Andrew Bennett presents on Erlang Dist Filtering and its role in the scaled WhatsApp runtime system, a highly distributed network with 30,000 nodes, showcasing improvements in performance and resilience.

Key takeaways
  • The Whatsapp runtime system is a highly distributed system with 30,000 nodes, and it uses Erlang’s disk protocol to send messages between nodes.
  • To improve performance and reduce single points of failure, Erlang Dist Filtering was developed to filter messages before they are sent to nodes.
  • The Erlang Dist Filtering project is a NIF that intercepts and rewrites inbound messages to enable disk filtering.
  • The project also includes loggers, which are stateful and do not preserve signal ordering.
  • The WhatsApp runtime system is not designed to be secure, but it is a trusted environment.
  • The system uses a distributed architecture, with a full mesh of connections between nodes.
  • The Erlang Dist Filtering project includes handlers, which are lossless and preserve signal ordering.
  • The project is still in development, but it has already improved performance by reducing the number of disk operations.
  • The WhatsApp runtime system uses a variety of strategies to reduce the impact of node failures, including automated restarts and distributed logging.
  • The system also uses a unique way of dealing with senders and receivers, which allows it to handle high volumes of traffic.
  • The loggers in the system are used to monitor and debug node failures, and to provide a paper trail for investigating issues.
  • The system is designed to be highly available, with multiple nodes and automated restarts.
  • The Erlang Dist Filtering project has improved performance and reduced the number of disk operations in the WhatsApp runtime system.
  • The project is still in development, but it has already had a significant impact on the performance and scalability of the system.