Kubernetes Schedulers for AI & Data use-cases

Amit Singh Rathore
4 min readMar 20, 2025

Custom scheduler for specialized use-cases

The Kubernetes scheduler assigns pods to nodes based on resource availability, constraints, and affinity/anti-affinity rules.

The default scheduler works well for long-running applications like web services, APIs, and microservices. But Data or ML workloads have certain unique requirements like the following:

Fair Resource Distribution

In clusters, where multiple teams share Kubernetes resources, it is critical to prevent resource hogging by a single user or application. This requires a queueing system.

Gang Scheduling

Spark tasks in a stage need to be launched together. Similarly distributed training in ML should also be launched together. If some tasks start while others are pending, it will waste resources and slow down job completion. This all-or-none scheduling is termed as Gang Scheduling.

Preemption

In clusters where different types of workloads are running, job priorities are important. We might want lower-priority jobs to be preempted by higher-priority ones when resources become constrained.

Reservation & Backfiling

When a huge job requests a large number of resources, we must reserve resources for jobs conditionally (timeout). When resources are reserved, they may be idle. To improve resource utilization, the scheduler needs to backfill smaller jobs to those reserved resources conditionally.

Job Dependency Management

Some applications require a sequence of jobs to be run in a specific order, where one job starts only after the previous one finishes.

Distributed Training Jobs

For distributed frameworks like Ray, Spark, or TensorFlow, tasks need to coordinate closely across multiple nodes. We need to ensure these tasks are placed optimally across nodes to minimize communication overhead and maximize performance.

In this blog, we will go through available, open-source solutions that support the above-mentioned requirements.

  • Volcano
  • Kueue
  • YuniKorn

Volcano

A Cloud Native Batch System

Volcano is a cloud-native system for high-performance, high-throughput computing(HPC), AI, ML, and other large-scale, batch computing workloads. It is a CNCF project.

Key Features

  • Supports CPU, GPU (CUDA & MIG), NPU type devices
  • Supports multi-cluster scheduling
  • Supports MPI (Message Passing Interface) jobs, which are common in distributed computing.
  • Built-in scheduling for batch workloads, ensuring fairness and priority for jobs.
  • Offers advanced job lifecycle management (preemption, backfilling, gang scheduling).
  • Can integrate with custom job controllers and other Kubernetes scheduling plugins.
  • Supports running TensorFlow, Spark, Ray, PyTorch, Flink, etc, which Volcano integrates with.
  • Used at companies like Amazon, Tencent, Baidu, Vivo, Visa (ML workloads)

Use Case

Volcano is ideal for complex HPC workloads, AI/ML model training, or other distributed computing tasks where job dependency and high scheduling precision are important.

helm repo add volcano-sh https://volcano-sh.github.io/helm-charts
helm repo update
helm install volcano volcano-sh/volcano -n volcano-system --create-namespace

Kueue

Kubernetes Native Job Queueing

Kueue is a Kubernetes-native queuing system designed to manage batch jobs and workloads more efficiently by introducing an abstraction layer between jobs and cluster resources. It is an open-source project under Apache license.

Key Features

  • Provides a lightweight queuing system for batch jobs.
  • Helps with better resource allocation, allowing you to run more jobs with limited resources.
  • Supports preemption & fair scheduling
  • Supports Multi-cluster job dispatching using MultiKueue / KubeStellar
  • Can be integrated with different scheduling systems, making it versatile.
  • Works with Kubernetes Job API, providing familiar interfaces.
  • Supports running TensorFlow, PyTorch, RayJob, RayCluster, Custom KubeFlow, etc.
  • Used at companies like Shopee, RedHat, CyberAgent, DaoCloud, Horizo Auto

Use Cases

Kueue is best suited for teams looking for a simple, Kubernetes-native solution to better handle resource allocation and fairness for batch jobs without requiring complex HPC-level capabilities.

helm install kueue oci://registry.k8s.io/kueue/charts/kueue \
--version=0.11.0 \
--namespace kueue-system \
--create-namespace \
--wait --timeout 300s

YuniKorn

Light-weight, universal resource scheduler

YuniKorn is a resource scheduler that can manage workloads in both Kubernetes and Apache Hadoop YARN environments. It is designed to handle large-scale resource scheduling for batch jobs and long-running services. It is an Apache foundation project.

Key Features

  • Cloud-native, multi-tenant
  • Supports fine-grained resource allocation (CPU, memory, GPU).
  • YuniKorn recognizes users, apps, and queues, and takes the resource, and ordering into consideration while making scheduling decisions.
  • Provides hierarchical queues and preemption, which ensures fairness among jobs and users.
  • Can schedule both batch and long-running jobs, making it more versatile.
  • YuniKorn automatically reserves resources for outstanding requests.
  • YuniKorn’s preemption feature allows higher-priority tasks to dynamically reallocate resources by preempting lower-priority ones.
  • Integrates with both Kubernetes and YARN, making it suitable for hybrid environments.
  • Provides UI for tracking resource and queue usage.
  • Supports running TensorFlow, Spark, Ray, etc.
  • Used as companies like Uber, Visa(Spark workloads), Apple, Pinterest

Use Cases

YuniKorn is useful in organizations that need to manage both Kubernetes and YARN clusters or have a mix of batch and long-running jobs with complex scheduling requirements.

helm repo add yunikorn https://apache.github.io/yunikorn-release
helm repo update
kubectl create namespace yunikorn
helm install yunikorn yunikorn/yunikorn --namespace yunikorn

--

--

Amit Singh Rathore
Amit Singh Rathore

Written by Amit Singh Rathore

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML

No responses yet