Spark speed-up — Accelerators
Published in
3 min readApr 27, 2024
Plugins/engines to improve spark’s runtime performance
With the improvement in disk area (SSD) and faster network speeds disk io is not that much of a bottleneck today in distributed computing. The new bottlenecks are CPU & memory (at times). Spark’s current JVM-based code gen design also suffers from these bottlenecks.
- GC Overhead
- Wholestage CodeGen has limitation linked to restriction in JVM
- SIMD linked limitations of JVM
- Issues with AVX* instruction set in JVM