Scaling EKS with Karpenter
Karpenter — A node lifecycle management project on K8s
What is Karpenter?
Karpenter is an open-source, flexible, high-performance Kubernetes auto-scaling solution for node lifecycle management. It helps improve our application’s availability and cluster efficiency by rapidly launching the right-sized compute resources in response to changing application load. It has the following salient features:
- Proactive Scaling: Instead of waiting for pods to remain unscheduled, Karpenter can anticipate future resource requirements based on pod specifications and start provisioning nodes in advance. This can lead to faster scaling because nodes are ready by the time they are needed.
- EC2 API instead of Autoscaling API: It directly makes AWS API calls, reducing latency.
- Batching: It launches the nodes through batches based on actual demand. This batching can significantly reduce the overall time from node creation to node readiness, as it optimizes the provisioning process. Karpenter heavily relies on the bin packaging algorithm to find the optimal distribution of resources across the cluster. When there’s a need for new capacity in the cluster, Karpenter batches the pods and performs bin packaging to find the most suitable node.
- Simple to Configure: It offers a simpler setup with fewer parameters to configure, which can lead to quicker deployment and easier management over time. This simplicity also reduces the risk of misconfiguration, which can affect scaling speed and reliability.
CAS and Karpenter both have the same goal — To scale the cluster. But both they in different ways.
CAS → ASG → Launch of nodes (EC2 APIs)
CAS → Managed Node Groups → Launch of nodes (EC2 APIs)
Karpenter → Launch of nodes (EC2 APIs)
How Karpenter works?
At a high level, Karpenter has the following four steps:
Watching
Evaluating
Provisioning
Disrupting
Karpenter works as an operator in the cluster. This operator periodically checks the cluster’s API for un-schedulable pods. When it finds such pods, it checks if it can pair them with any NodePool
.
A
NodePool
is a custom resource we can create, which outlines a set of rules and conditions under which to create additional nodes for the cluster.
If Karpenter finds a match, it will create a NodeClaim
, and tries to provision a new EC2 instance to be used as the new node. Karpenter also periodically checks if a new node is needed or not, if it is not needed Karpenter terminates it.
Installation
Before we install Karpenter we need to create 2 AWS roles.
one for the Karpenter controller
one for the instances that it will create
We also need to add the relevant entries to the aws-auth
configmap under the kube-system
namespace. Reference.
NodeRole
arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
Karpenter Role (IRSA)
KarpenterControllerPolicy
To install the Karpenter operator, we can use Helm.
helm repo add karpenter https://charts.karpenter.sh
helm repo update
helm install karpenter karpenter/karpenter --namespace karpenter --version latest --create-namespace
# (Optional) We can also generate the template by running following command.
helm template karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace karpenter
--set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME}
--set settings.clusterName=${CLUSTER_NAME}
--set serviceAccount.annotations."eks.amazonaws.com/role-arn"="arn:${AWS_PARTITION}:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}"
--set controller.resources.requests.cpu=1
--set controller.resources.requests.memory=1Gi
--set controller.resources.limits.cpu=1
--set controller.resources.limits.memory=1Gi > karpenter.yaml
Let’s look at a basic values.yaml
file for the Helm chart.
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<accountID>:role/<karpenterRoleName>
settings:
clusterName: <clusterName>
clusterEndpoint: https://<clusterEndpoint>.eks.amazonaws.com
Replace the placeholders with your values. Karpenter assumes the role of the <karpenterRoleName>
service account, and uses it to create EC2 instances in AWS. It uses the clusterName
and clusterEndpoint
values to join the EC2 instance as a Kubernetes node to the cluster.
You can get the clusterEndpoint
values by running the command:
aws eks describe-cluster --name <CLUSTER_NAME> --query "cluster.endpoint" --output text
Once our Karpenter Helm chart is deployed, it’s time to get familiar with the NodePool
and EC2NodeClass
CRDs.
In old versions Provisioner & AWSNodeTemplate were used.
Provisioner — NodePool
AWSNodeTemplate — EC2NodeClass
API
karpenter.sh/Provisioner -> karpenter.sh/NodePool
karpenter.sh/Machine -> karpenter.sh/NodeClaim
karpenter.k8s.aws/AWSNodeTemplate -> karpenter.k8s.aws/EC2NodeClass
A NodePool
is a more Kubernetes-centered representation of the nodes that should be created. Using this we can configure about K8s specific construct like taints, labels etc.
The EC2NodeClass
lets us fine-tune AWS-specific settings, such as which subnets the nodes will be created in, any mapped block devices, security groups, AMI families, and many more options that we can control.
Let’s look at an example of a very basic pair of these two buddies:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: ray-worker-node-class
spec:
amiFamily: AL2
role: <karpenterProfileInstanceRole>
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: <ray-clusterName>
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: <ray-clusterName>
tags:
karpenter.sh/discovery: <ray-clusterName>
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: ray-worker-nodepool
spec:
template:
spec:
labels:
app-name: ray-clustername
annotations:
example.com/owner: "my-team"
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["t3"]
- key: karpenter.k8s.aws/instance-size
operator: In
values: ["medium","large", "xlarge"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
name: ray-worker-node-class
taints:
- key: cluster.nodepool/ray-clustername
effect: NoSchedule
limits:
cpu: "80"
memory: "320Gi"
disruption:
consolidationPolicy: WhenEmpty
consolidateAfter: 30s
expireAfter: 360h
In our EC2NodeClass
, we’re declaring that NodePools
assigned to this class will run an AL2 image, use the role we have created, and be part of the subnets and security groups that are tagged with karpenter.sh/discovery: <ray-clusterName>
.
In our NodePool
, we specify that its class is the ray-worker-node-class
class we have created. This NodePool defines that the node should be of linux architecture (Linux/amd64) be on-demand instances (rather than spot), be of specific sizes and machine families. Finally, we also define a limit that controls how many nodes will be spun up through this NodePool
, they should not collectively have more than 80 CPUs or 320GiB of RAM collectively.
We can create as many NodePools
and EC2NodeClasses
as we’d like, and give them different parameters, labels, and taints, to accommodate our organization’s specific needs and wants.
Once the node is no longer needed, it will be terminated and deregistered from the cluster. We can individually delete the node or the node claim as well.
# Delete a specific nodeclaim
kubectl delete nodeclaim $NODECLAIM_NAME
# Delete a specific node
kubectl delete node $NODE_NAME
# Delete all nodeclaims
kubectl delete nodeclaims --all
# Delete all nodes owned by any nodepool
kubectl delete nodes -l karpenter.sh/nodepool
# Delete all nodeclaims owned by a specific nodepoolXS
kubectl delete nodeclaims -l karpenter.sh/nodepool=$NODEPOOL_NAME