Scaling EKS with Karpenter

Amit Singh Rathore
5 min readMay 20, 2024

Karpenter — A node lifecycle management project on K8s

What is Karpenter?

Karpenter is an open-source, flexible, high-performance Kubernetes auto-scaling solution for node lifecycle management. It helps improve our application’s availability and cluster efficiency by rapidly launching the right-sized compute resources in response to changing application load. It has the following salient features:

  • Proactive Scaling: Instead of waiting for pods to remain unscheduled, Karpenter can anticipate future resource requirements based on pod specifications and start provisioning nodes in advance. This can lead to faster scaling because nodes are ready by the time they are needed.
  • EC2 API instead of Autoscaling API: It directly makes AWS API calls, reducing latency.
  • Batching: It launches the nodes through batches based on actual demand. This batching can significantly reduce the overall time from node creation to node readiness, as it optimizes the provisioning process. Karpenter heavily relies on the bin packaging algorithm to find the optimal distribution of resources across the cluster. When there’s a need for new capacity in the cluster, Karpenter batches the pods and performs bin packaging to find the most suitable node.
  • Simple to Configure: It offers a simpler setup with fewer parameters to configure, which can lead to quicker deployment and easier management over time. This simplicity also reduces the risk of misconfiguration, which can affect scaling speed and reliability.

CAS and Karpenter both have the same goal — To scale the cluster. But both they in different ways.

CAS → ASG → Launch of nodes (EC2 APIs)
CAS → Managed Node Groups → Launch of nodes (EC2 APIs)
Karpenter → Launch of nodes (EC2 APIs)

How Karpenter works?

At a high level, Karpenter has the following four steps:

Watching
Evaluating

Provisioning
Disrupting

Karpenter works as an operator in the cluster. This operator periodically checks the cluster’s API for un-schedulable pods. When it finds such pods, it checks if it can pair them with any NodePool.

A NodePool is a custom resource we can create, which outlines a set of rules and conditions under which to create additional nodes for the cluster.

If Karpenter finds a match, it will create a NodeClaim, and tries to provision a new EC2 instance to be used as the new node. Karpenter also periodically checks if a new node is needed or not, if it is not needed Karpenter terminates it.

Installation

Before we install Karpenter we need to create 2 AWS roles.

one for the Karpenter controller
one for the instances that it will create

We also need to add the relevant entries to the aws-auth configmap under the kube-system namespace. Reference.

NodeRole

arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

Karpenter Role (IRSA)

KarpenterControllerPolicy

To install the Karpenter operator, we can use Helm.

helm repo add karpenter https://charts.karpenter.sh
helm repo update
helm install karpenter karpenter/karpenter --namespace karpenter --version latest --create-namespace

# (Optional) We can also generate the template by running following command.
helm template karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace karpenter
--set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME}
--set settings.clusterName=${CLUSTER_NAME}
--set serviceAccount.annotations."eks.amazonaws.com/role-arn"="arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}"
--set controller.resources.requests.cpu=1
--set controller.resources.requests.memory=1Gi
--set controller.resources.limits.cpu=1
--set controller.resources.limits.memory=1Gi > karpenter.yaml

Let’s look at a basic values.yaml file for the Helm chart.

serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<accountID>:role/<karpenterRoleName>
settings:
clusterName: <clusterName>
clusterEndpoint: https://<clusterEndpoint>.eks.amazonaws.com

Replace the placeholders with your values. Karpenter assumes the role of the <karpenterRoleName> service account, and uses it to create EC2 instances in AWS. It uses the clusterName and clusterEndpoint values to join the EC2 instance as a Kubernetes node to the cluster.

You can get the clusterEndpoint values by running the command:

aws eks describe-cluster --name <CLUSTER_NAME> --query "cluster.endpoint" --output text

Once our Karpenter Helm chart is deployed, it’s time to get familiar with the NodePool and EC2NodeClass CRDs.

In old versions Provisioner & AWSNodeTemplate were used.
Provisioner — NodePool
AWSNodeTemplate — EC2NodeClass

API

karpenter.sh/Provisioner -> karpenter.sh/NodePool
karpenter.sh/Machine -> karpenter.sh/NodeClaim
karpenter.k8s.aws/AWSNodeTemplate -> karpenter.k8s.aws/EC2NodeClass

A NodePool is a more Kubernetes-centered representation of the nodes that should be created. Using this we can configure about K8s specific construct like taints, labels etc.

The EC2NodeClass lets us fine-tune AWS-specific settings, such as which subnets the nodes will be created in, any mapped block devices, security groups, AMI families, and many more options that we can control.

Let’s look at an example of a very basic pair of these two buddies:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: ray-worker-node-class
spec:
amiFamily: AL2
role: <karpenterProfileInstanceRole>
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: <ray-clusterName>
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: <ray-clusterName>
tags:
karpenter.sh/discovery: <ray-clusterName>
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: ray-worker-nodepool
spec:
template:
spec:
labels:
app-name: ray-clustername
annotations:
example.com/owner: "my-team"
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["t3"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["medium","large", "xlarge"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
name: ray-worker-node-class
taints:
- key: cluster.nodepool/ray-clustername
effect: NoSchedule
limits:
cpu: "80"
memory: "320Gi"
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
expireAfter: 360h

In our EC2NodeClass, we’re declaring that NodePools assigned to this class will run an AL2 image, use the role we have created, and be part of the subnets and security groups that are tagged with karpenter.sh/discovery: <ray-clusterName> .

In our NodePool, we specify that its class is the ray-worker-node-class class we have created. This NodePool defines that the node should be of linux architecture (Linux/amd64) be on-demand instances (rather than spot), be of specific sizes and machine families. Finally, we also define a limit that controls how many nodes will be spun up through this NodePool, they should not collectively have more than 80 CPUs or 320GiB of RAM collectively.

We can create as many NodePools and EC2NodeClasses as we’d like, and give them different parameters, labels, and taints, to accommodate our organization’s specific needs and wants.

Once the node is no longer needed, it will be terminated and deregistered from the cluster. We can individually delete the node or the node claim as well.

# Delete a specific nodeclaim
kubectl delete nodeclaim $NODECLAIM_NAME

# Delete a specific node
kubectl delete node $NODE_NAME

# Delete all nodeclaims
kubectl delete nodeclaims --all

# Delete all nodes owned by any nodepool
kubectl delete nodes -l karpenter.sh/nodepool

# Delete all nodeclaims owned by a specific nodepoolXS
kubectl delete nodeclaims -l karpenter.sh/nodepool=$NODEPOOL_NAME

--

--

Amit Singh Rathore

Staff Data Engineer @ Visa — Writes about Cloud | Big Data | ML