We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
ElixirConf 2023 - Razvan Draghici - Managing a massive amount of distributed Elixir nodes
Discover the challenges and solutions of managing a massive amount of distributed Elixir nodes, including supervision trees, node connections, and performance optimization.
- The speaker’s ElixirConf 2023 presentation about managing a massive amount of distributed Elixir nodes.
- The speaker’s example showed a supervision tree with a net-sup supervisor, a partisan peer service, and a Broadway consumer for measuring node connections.
- Starting distributed Erlang with all 100 nodes resulted in no disconnected nodes, but with larger node counts, some nodes disconnected without message delivery.
-
The speaker analyzed 300 nodes and found a 2% message loss, while 500 nodes was within the acceptable range with
net_kernel
ticks increasing. - Using PubSub allows for better performance and less noise when handling a large cluster size.
- The speaker built their own benchmark module with data from AWS metadata API.
-
They also created a simple adapter using Ethereum’s
ECSV'
library. - The speaker encountered port driver issues, which are controlled by the Erlang runt-time.
- The speaker utilized cookies for grouping nodes together, stating cookies are not for security.
- Pub/Sub tests were run separately on each node, with a dedicated listener for each node.
- The delay during Pub/Sub was mostly unnoticeable.
- Error handling was implemented, but the speaker saw errors in their test code which caused nodes to disconnect when connections were made.
- Error rate increased with node count in one of the tests.
… No further points