Scale faster with Karpenter: Kubernetes Autoscaler—skip slow node groups and provision right-sized compute in seconds. This hands-on guide walks you through setup, real configs, and production-ready NodePool templates to cut costs and boost performance. Don’t let inefficient autoscaling drain your AWS budget—deploy Karpenter the right way now. #CentLinux #K8s #Karpenter
Table of Contents
Introduction
Karpenter replaces the “wait for cluster auto-scaler + node groups” model with direct, fast, demand-driven node provisioning. It watches un-schedulable pods and spins up right-sized nodes in seconds. (Karpenter Official website)
This guide is command-first and assumes you want it working—not theorized.
What You’ll Build
- Karpenter installed on Amazon EKS
- Dynamic node provisioning based on pod demand
- Cost-aware, right-sized nodes (no static node groups)
- Autoscaling demo workload

Prerequisites
You need a working AWS + Kubernetes setup.
- AWS CLI configured
kubectl,helm,eksctlinstalled- An EKS cluster (1.24+ recommended)
- IAM permissions to create roles, policies, EC2, etc.
Verify your environment by executing following commands.
aws sts get-caller-identity
kubectl version --short
helm version
eksctl versionKarpenter Setup on AWS EKS
Step 1: Create (or Verify) EKS Cluster
If you haven’t setup an EKS Cluster yet, you can create your EKS Cluster for karpenter-demo by using following command.
eksctl create cluster \
--name karpenter-demo \
--region us-east-1 \
--nodes 2 \
--node-type t3.mediumAdd karpenter-demo cluster in your KubeConfig file.
aws eks update-kubeconfig --name karpenter-demo --region us-east-1Check nodes in your karpenter-demo Cluster.
kubectl get nodesStep 2: Set Environment Variables
To keep this consistent across commands, you need to add following commands in your .bashrc or you can execute it once per your each Bash session.
export CLUSTER_NAME=karpenter-demo
export AWS_REGION=us-east-1
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)Step 3: Install Karpenter IAM Resources
Karpenter needs permission to launch EC2 instances. Therefore, we need to create an IAM Policy.
curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/main/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > karpenter.yaml
aws cloudformation deploy \
--stack-name Karpenter-$CLUSTER_NAME \
--template-file karpenter.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides ClusterName=$CLUSTER_NAMEStep 4: Create IAM Role for Service Account (IRSA)
Before Karpenter can touch EC2, your cluster needs a trust bridge into AWS IAM. That bridge is the OIDC provider. This is what enables IRSA (IAM Roles for Service Accounts)—the mechanism that lets a pod assume an IAM role without baking credentials into images or abusing the node role.
eksctl utils associate-iam-oidc-provider \
--region $AWS_REGION \
--cluster $CLUSTER_NAME \
--approve
If you skip or misconfigure this, Karpenter will deploy fine but fail at runtime with AccessDenied when it tries to launch nodes.
Create a service account now:
eksctl create iamserviceaccount \
--cluster $CLUSTER_NAME \
--namespace karpenter \
--name karpenter \
--attach-policy-arn arn:aws:iam::$ACCOUNT_ID:policy/KarpenterControllerPolicy-$CLUSTER_NAME \
--approveStep 5: Install Karpenter via Helm
Add Karpenter official repo in Helm.
helm repo add karpenter https://charts.karpenter.sh
helm repo updateYou can now easily install Kerpenter by using Helm command.
helm upgrade --install karpenter karpenter/karpenter \
--namespace karpenter \
--create-namespace \
--set serviceAccount.create=false \
--set serviceAccount.name=karpenter \
--set settings.clusterName=$CLUSTER_NAME \
--set settings.clusterEndpoint=$(aws eks describe-cluster \
--name $CLUSTER_NAME \
--query "cluster.endpoint" \
--output text) \
--set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-$CLUSTER_NAMEStep 6: Create Karpenter Provisioner
This is where the real control happens.
Create a manifest for your Karpenter Provisioner. Use vim text editor and add following YAML in file provisioner.yaml:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
requirements:
- key: "node.kubernetes.io/instance-type"
operator: In
values: ["t3.medium", "t3.large", "m5.large"]
limits:
resources:
cpu: 1000
provider:
subnetSelector:
karpenter.sh/discovery: karpenter-demo
securityGroupSelector:
karpenter.sh/discovery: karpenter-demo
ttlSecondsAfterEmpty: 30Now use kubectl command to apply this manifest to your EKS cluster.
kubectl apply -f provisioner.yamlStep 7: Tag Subnets and Security Groups
Karpenter doesn’t “scan your account” or guess infrastructure. It selects subnets and security groups strictly via AWS tags you define in your EC2NodeClass. If tags don’t match, Karpenter acts like nothing exists—and provisioning fails.
aws ec2 create-tags \
--resources <subnet-id> \
--tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAMEThis attaches a key/value tag to one or more EC2 resources (here: subnets). Karpenter later filters AWS resources by this tag to decide where it’s allowed to launch nodes.
Same logic applies to security groups.
aws ec2 create-tags \
--resources <sg-id> \
--tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAMEStep 8: Deploy Test Workload (Trigger Autoscaling)
Create a YAML manifest (inflate.yaml) and add following YAML therein. you can use vim or nano text editor for this purpose.
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 0
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
cpu: 1Create the deployment by applying this manifest to your EKS cluster.
kubectl apply -f inflate.yamlScale up your deployment to 10 replicas.
kubectl scale deployment inflate --replicas=10Step 9: Watch Karpenter in Action
Check status of your pods in realtime.
kubectl get pods -wIn another terminal, execute following command to check your cluster nodes in realtime.
kubectl get nodes -wYou’ll see new nodes created automatically.
Check logs:
kubectl logs -n karpenter -l app.kubernetes.io/name=karpenterVerification
Verify that new nodes are provisioned by the Karpenter:
kubectl get nodes -o wideCheck pod placement:
kubectl describe pod <pod-name>Look for:
- Node assignment
- No scheduling failures
Cleanup Test
To cleanup test, Scale down your deployment to 0 replicas.
kubectl scale deployment inflate --replicas=0Wait ~30 seconds (based on ttlSecondsAfterEmpty).
Verify node termination:
kubectl get nodesTroubleshooting
Pods Stuck Pending
Check:
kubectl describe pod <pod-name>Common causes:
- No matching instance types
- Subnets not tagged
- Security group mismatch
Karpenter Not Launching Nodes
Check logs:
kubectl logs -n karpenter deployment/karpenterLook for:
- IAM permission errors
- EC2 quota limits
- Invalid AMI
IAM Issues
Validate role:
aws iam get-role --role-name KarpenterNodeRole-$CLUSTER_NAMECheck policies attached.
No Subnets Found
Verify tags:
aws ec2 describe-subnets \
--filters Name=tag:karpenter.sh/discovery,Values=$CLUSTER_NAMEBest Practices
Keep Provisioners Narrow
Don’t allow every instance type.
values: ["m5.large", "m5.xlarge"]Reduces cost unpredictability.
Use Multiple Provisioners
Separate workloads:
- CPU-heavy
- Memory-heavy
- Spot vs On-demand
Enable Spot Instances
Add:
requirements:
- key: "karpenter.sh/capacity-type"
operator: In
values: ["spot"]Set Resource Limits
Avoid runaway scaling:
limits:
resources:
cpu: 500Use Consolidation (Newer Versions)
Reduces cost by replacing inefficient nodes.
Common Mistakes
- Forgetting subnet/security group tags
- Allowing too many instance types (chaotic scaling)
- No limits → surprise AWS bill
- Ignoring IAM role setup (most failures here)
- Using default configs in production
Real-World Pattern
Typical production setup:
- 1 provisioner for spot (cheap workloads)
- 1 provisioner for on-demand (critical workloads)
- strict instance families (m5, c5, r5)
- CPU/memory limits enforced
- consolidation enabled
Production-Grade Karpenter Provisioner Templates
These are hardened templates you can drop into a real cluster. They assume:
- Tagged subnets and security groups (
karpenter.sh/discovery=$CLUSTER_NAME) - Separate workloads (critical vs batch)
- You care about cost control and predictable behavior
We’ll use Karpenter v1+ APIs (NodePool + EC2NodeClass). If you’re still on older Provisioner, upgrade—this is where Karpenter is headed.
If you’re working through Karpenter and want to go deeper on Kubernetes internals and autoscaling patterns, it’s worth keeping a solid reference on hand. Two consistently high-signal options are The Kubernetes Book (updated regularly and widely used by DevOps engineers) and Kubernetes in Action (more detailed, architecture-heavy). Both are practical, hands-on resources that complement what you’re doing here—especially when you start tuning scheduling, scaling, and cluster behavior at scale.
Disclaimer: This section contains affiliate links. If you purchase through these links, a small commission may be earned at no additional cost to you.
Base Building Block: EC2NodeClass (Shared)
This defines how nodes are launched (AMI, subnets, security groups, etc.).
Create KYAML manifest (ec2nodeclass.yaml) by using a text editor:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2
role: KarpenterNodeRole-karpenter-demo
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: karpenter-demo
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: karpenter-demo
tags:
Environment: production
ManagedBy: karpenter
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
encrypted: trueApply manifest to your EKS cluster.
kubectl apply -f ec2nodeclass.yamlTemplate 1: On-Demand (Critical Workloads)
Use this for:
- APIs
- Databases
- Anything that cannot be interrupted
Create nodepool-ondemand.yaml:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: ondemand
spec:
template:
metadata:
labels:
workload: critical
spec:
nodeClassRef:
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m5.2xlarge"]
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b", "us-east-1c"]
limits:
cpu: 500
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 60sApply this manifest to your EKS Cluster.
kubectl apply -f nodepool-ondemand.yamlWhy this works:
- Restricts to stable instance family (m5)
- Multi-AZ spread
- Conservative consolidation
Template 2: Spot (Cost-Optimized Workloads)
Use this for:
- CI/CD jobs
- batch processing
- stateless workers
Create nodepool-spot.yaml:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot
spec:
template:
metadata:
labels:
workload: batch
spec:
nodeClassRef:
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values:
["c5.large", "c5.xlarge", "c5.2xlarge",
"m5.large", "m5.xlarge",
"r5.large"]
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30sApply above manifest to your EKS cluster:
kubectl apply -f nodepool-spot.yamlWhy this works:
- Diversified instance pool → fewer spot interruptions
- Faster consolidation to cut cost
Template 3: GPU Workloads
Use for ML or CUDA workloads.
Create nodepool-gpu.yaml:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: gpu
spec:
template:
metadata:
labels:
workload: gpu
spec:
nodeClassRef:
name: default
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["g4dn.xlarge", "g5.xlarge"]
limits:
cpu: 200
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 120sTemplate 4: Memory-Optimized (Databases, Caches)
Create nodepool-memory.yaml:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: memory
spec:
template:
metadata:
labels:
workload: memory
spec:
nodeClassRef:
name: default
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["r5.large", "r5.xlarge", "r5.2xlarge"]
limits:
cpu: 300
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 90sScheduling Workloads to the Right NodePool
Use labels + node selectors.
Example deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
nodeSelector:
workload: critical
containers:
- name: api
image: nginx
resources:
requests:
cpu: "500m"Add Taints (Production Control)
Prevent accidental scheduling.
Example (spot nodes only):
spec:
template:
spec:
taints:
- key: spot
value: "true"
effect: NoScheduleThen tolerate:
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"Enforce Cost Controls (Hard Limits)
Don’t skip this.
limits:
cpu: 300Also enforce namespace quotas:
kubectl create quota compute-quota \
--hard=cpu=100,memory=200Gi \
-n productionObservability (Minimal Setup)
Check nodepool usage:
kubectl get nodepoolsCheck provisioning decisions:
kubectl logs -n karpenter deployment/karpenterWatch node churn:
kubectl get nodes -wProduction Hardening Checklist
- Limit instance types (avoid “anything goes”)
- Separate spot vs on-demand
- Use taints for isolation
- Enforce CPU limits per NodePool
- Tag everything (cost visibility)
- Enable consolidation
- Use multi-AZ always
- Monitor EC2 quotas
Common Production Pitfalls
- Too many instance types → unpredictable cost + scheduling
- No workload separation → critical apps land on spot nodes
- No limits → runaway scaling
- Ignoring consolidation → wasted money
- Single AZ config → fragile cluster
Real-World Layout (Reference)
Typical setup:
ondemand→ APIs, ingress, core servicesspot→ batch jobs, CI runnersmemory→ Redis, Kafkagpu→ ML workloads
All controlled via:
- labels
- taints
- strict instance families
Final Thoughts
Karpenter gives you fine-grained control—but it won’t protect you from bad decisions.
In production, success comes from tight constraints, not open-ended flexibility:
- restrict instance families to known, stable types
- enforce hard scaling limits
- isolate workloads with labels, taints, and dedicated NodePools
Do that consistently, and the system behaves:
- scaling stays fast and predictable
- infrastructure cost remains controlled
- scheduling is clean, with fewer surprises and less churn
FAQs
What is Karpenter in Kubernetes and how is it different from Cluster Autoscaler?
Karpenter provisions nodes directly based on pending pods, without relying on predefined node groups. It’s faster and more flexible than Cluster Autoscaler, which scales existing node groups and is slower due to ASG constraints.
How does Karpenter decide which EC2 instance type to launch?
It evaluates pod resource requests (CPU, memory, constraints) against the NodePool requirements, then selects the smallest compatible instance type from the allowed list. If multiple options exist, it optimizes for availability and cost.
Why are my pods stuck in Pending even with Karpenter installed?
Common causes:
No matching instance types in NodePool
Subnets or security groups not tagged correctly
IAM/IRSA misconfiguration
Resource requests too large for allowed instances
Run kubectl describe pod <name> and check Karpenter logs.
Can Karpenter use Spot instances safely in production?
Yes, but only for fault-tolerant workloads. Use a separate NodePool with diversified instance types and combine with taints/tolerations. Never run critical services on Spot without fallback.
How do I limit Karpenter from over-scaling and increasing AWS costs?
Set hard limits in NodePool:
limits:
cpu: 300
Also restrict instance types and use consolidation to remove underutilized nodes. Without limits, Karpenter will scale aggressively based on demand.
Recommended Trainings
If you are preparing for the [NEW] Ultimate AWS Certified Cloud Practitioner CLF-C02 2025 exam, then Stephane Maarek’s top-rated online course is one of the best investments you can make. Known for his clear teaching style and real-world cloud expertise, Stephane has helped thousands of students pass AWS certifications with confidence.
This updated course covers everything you need to know for the 2025 exam version, including hands-on examples, exam tips, and detailed explanations that simplify even the toughest concepts. Don’t just study—prepare smartly with a course that gives you the edge.
Disclaimer: Some of the links in this post are affiliate links, meaning I may earn a commission if you click through and make a purchase—at no additional cost to you.

Leave a Reply
You must be logged in to post a comment.