We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Using Go to Scale Audit Logging at Cloudflare - Arti Phugat, Cloudflare
Learn how Cloudflare scaled their audit logging from 30 to 3,000 messages/second using Go, Kafka optimizations, and smart architectural choices for high performance.
-
Audit logs track changes to system configuration, recording who made changes, what was changed, when it occurred, and through which interface (API/UI)
-
Cloudflare scaled their audit logging system from 30-35 messages/second to 2,500-3,000 messages/second by:
- Using Go routines for concurrent processing
- Implementing batch processing of messages
- Horizontally scaling Kafka consumers
- Caching internal service responses
-
Key Kafka consumer optimizations:
- Using consumer groups instead of single consumers
- Setting appropriate batch sizes (500 in their case)
- Configuring optimal session timeouts (20 seconds)
- Running multiple consumer pods
- Matching partition count to expected throughput
-
System bottlenecks were identified and resolved through:
- CPU and memory profiling
- Metrics collection and visualization with Grafana
- Monitoring database latency
- Tracking consumer lag
-
Performance improvements implemented:
- Batch database insertions instead of individual queries
- Parallel request transformation using Go routines
- Redis caching for internal service responses
- Horizontal scaling of application pods in Kubernetes
-
Go was chosen for its:
- Strong concurrency support via goroutines and channels
- Extensive standard library
- High performance characteristics
- Easy learning curve
-
Architecture decisions included:
- Event-driven design using Kafka
- Kubernetes for container orchestration
- Multiple service replicas for high availability
- Decoupled components for better scalability