Java Performance Update 2024 by Per Minborg

Explore Java 24/25 performance gains: memory segments, generational ZGC garbage collection, object header optimization, vectorization, and proper benchmarking techniques.

Key takeaways
  • Performance gains and optimizations in Java 24/25 include merge store operations, improved loop unrolling, auto vectorization, and better memory segment handling

  • Memory segments (part of Foreign Function & Memory API) provide significant performance benefits by allowing operations on larger chunks of data (64-bit, 256-bit or more) instead of byte-by-byte processing

  • Use JMH (Java Microbenchmark Harness) for proper performance testing - it handles warm-up phases, prevents JVM optimizations from skewing results, and provides proper error margins and statistics

  • Don’t use laptops for performance benchmarking - factors like CPU throttling, power management, and heat can significantly impact results. Use dedicated servers instead.

  • The new generational ZGC garbage collector (default in JDK 24) provides better performance for large heaps up to terabyte sizes with minimal stop-the-world events

  • Performance metrics should consider multiple factors beyond just throughput:

    • Startup time and warm-up periods
    • Memory usage and cache efficiency
    • Power consumption and CPU utilization
    • Worst-case latency scenarios
    • Resource contention
  • Object headers optimization (Project Lilliput) reduces memory usage by 20-30% by shrinking object headers from 128 bits to 64 bits

  • Native code isn’t always faster than Java - in many cases Java can match or exceed native performance, especially after JIT compilation and optimization

  • Proper performance testing requires:

    • Adequate warm-up periods (thousands of iterations)
    • Testing on supported platforms/architectures
    • Consideration of GC, thread pools, and CPU core count
    • Using System.nanoTime() instead of currentTimeMillis()
  • Future improvements will focus on auto vectorization, virtual thread handling, and better startup performance through ahead-of-time compilation