Spark speed-up — Accelerators

Amit Singh Rathore
Dev Genius
Published in
3 min readApr 27, 2024

--

Plugins/engines to improve spark’s runtime performance

With the improvement in disk area (SSD) and faster network speeds disk io is not that much of a bottleneck today in distributed computing. The new bottlenecks are CPU & memory (at times). Spark’s current JVM-based code gen design also suffers from these bottlenecks.

  • GC Overhead
  • Wholestage CodeGen has limitation linked to restriction in JVM
  • SIMD linked limitations of JVM
  • Issues with AVX* instruction set in JVM

--

--