How to Use Karpenter: Kubernetes Autoscaler

Q: Why are my pods stuck in Pending even with Karpenter installed?

Common causes: No matching instance types in NodePool Subnets or security groups not tagged correctly IAM/IRSA misconfiguration Resource requests too large for allowed instances Run kubectl describe pod <name> and check Karpenter logs.

Q: Can Karpenter use Spot instances safely in production?

Yes, but only for fault-tolerant workloads. Use a separate NodePool with diversified instance types and combine with taints/tolerations. Never run critical services on Spot without fallback.

Q: How do I limit Karpenter from over-scaling and increasing AWS costs?

Set hard limits in NodePool: limits: cpu: 300 Also restrict instance types and use consolidation to remove underutilized nodes. Without limits, Karpenter will scale aggressively based on demand.

Share on Social Media

Scale faster with Karpenter: Kubernetes Autoscaler—skip slow node groups and provision right-sized compute in seconds. This hands-on guide walks you through setup, real configs, and production-ready NodePool templates to cut costs and boost performance. Don’t let inefficient autoscaling drain your AWS budget—deploy Karpenter the right way now. #CentLinux #K8s #Karpenter

Introduction

Karpenter replaces the “wait for cluster auto-scaler + node groups” model with direct, fast, demand-driven node provisioning. It watches un-schedulable pods and spins up right-sized nodes in seconds. (Karpenter Official website)

This guide is command-first and assumes you want it working—not theorized.

What You’ll Build

Karpenter installed on Amazon EKS
Dynamic node provisioning based on pod demand
Cost-aware, right-sized nodes (no static node groups)
Autoscaling demo workload

How to Use Karpenter: Kubernetes Autoscaler

Prerequisites

You need a working AWS + Kubernetes setup.

AWS CLI configured
kubectl, helm, eksctl installed
An EKS cluster (1.24+ recommended)
IAM permissions to create roles, policies, EC2, etc.

Verify your environment by executing following commands.

aws sts get-caller-identity
kubectl version --short
helm version
eksctl version

Karpenter Setup on AWS EKS

Step 1: Create (or Verify) EKS Cluster

If you haven’t setup an EKS Cluster yet, you can create your EKS Cluster for karpenter-demo by using following command.

eksctl create cluster \
  --name karpenter-demo \
  --region us-east-1 \
  --nodes 2 \
  --node-type t3.medium

Add karpenter-demo cluster in your KubeConfig file.

aws eks update-kubeconfig --name karpenter-demo --region us-east-1

Check nodes in your karpenter-demo Cluster.

kubectl get nodes

Step 2: Set Environment Variables

To keep this consistent across commands, you need to add following commands in your .bashrc or you can execute it once per your each Bash session.

export CLUSTER_NAME=karpenter-demo
export AWS_REGION=us-east-1
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

Step 3: Install Karpenter IAM Resources

Karpenter needs permission to launch EC2 instances. Therefore, we need to create an IAM Policy.

curl -fsSL https://raw.githubusercontent.com/aws/karpenter-provider-aws/main/website/content/en/preview/getting-started/getting-started-with-karpenter/cloudformation.yaml > karpenter.yaml

aws cloudformation deploy \
  --stack-name Karpenter-$CLUSTER_NAME \
  --template-file karpenter.yaml \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameter-overrides ClusterName=$CLUSTER_NAME

Step 4: Create IAM Role for Service Account (IRSA)

Before Karpenter can touch EC2, your cluster needs a trust bridge into AWS IAM. That bridge is the OIDC provider. This is what enables IRSA (IAM Roles for Service Accounts)—the mechanism that lets a pod assume an IAM role without baking credentials into images or abusing the node role.

eksctl utils associate-iam-oidc-provider \
  --region $AWS_REGION \
  --cluster $CLUSTER_NAME \
  --approve

If you skip or misconfigure this, Karpenter will deploy fine but fail at runtime with AccessDenied when it tries to launch nodes.

Create a service account now:

eksctl create iamserviceaccount \
  --cluster $CLUSTER_NAME \
  --namespace karpenter \
  --name karpenter \
  --attach-policy-arn arn:aws:iam::$ACCOUNT_ID:policy/KarpenterControllerPolicy-$CLUSTER_NAME \
  --approve

Step 5: Install Karpenter via Helm

Add Karpenter official repo in Helm.

helm repo add karpenter https://charts.karpenter.sh
helm repo update

You can now easily install Kerpenter by using Helm command.

helm upgrade --install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  --set serviceAccount.create=false \
  --set serviceAccount.name=karpenter \
  --set settings.clusterName=$CLUSTER_NAME \
  --set settings.clusterEndpoint=$(aws eks describe-cluster \
    --name $CLUSTER_NAME \
    --query "cluster.endpoint" \
    --output text) \
  --set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-$CLUSTER_NAME

Step 6: Create Karpenter Provisioner

This is where the real control happens.

Create a manifest for your Karpenter Provisioner. Use vim text editor and add following YAML in file provisioner.yaml:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  requirements:
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["t3.medium", "t3.large", "m5.large"]

  limits:
    resources:
      cpu: 1000

  provider:
    subnetSelector:
      karpenter.sh/discovery: karpenter-demo
    securityGroupSelector:
      karpenter.sh/discovery: karpenter-demo

  ttlSecondsAfterEmpty: 30

Now use kubectl command to apply this manifest to your EKS cluster.

kubectl apply -f provisioner.yaml

Step 7: Tag Subnets and Security Groups

Karpenter doesn’t “scan your account” or guess infrastructure. It selects subnets and security groups strictly via AWS tags you define in your EC2NodeClass. If tags don’t match, Karpenter acts like nothing exists—and provisioning fails.

aws ec2 create-tags \
  --resources <subnet-id> \
  --tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME

This attaches a key/value tag to one or more EC2 resources (here: subnets). Karpenter later filters AWS resources by this tag to decide where it’s allowed to launch nodes.

Same logic applies to security groups.

aws ec2 create-tags \
  --resources <sg-id> \
  --tags Key=karpenter.sh/discovery,Value=$CLUSTER_NAME

Step 8: Deploy Test Workload (Trigger Autoscaling)

Create a YAML manifest (inflate.yaml) and add following YAML therein. you can use vim or nano text editor for this purpose.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
          resources:
            requests:
              cpu: 1

Create the deployment by applying this manifest to your EKS cluster.

kubectl apply -f inflate.yaml

Scale up your deployment to 10 replicas.

kubectl scale deployment inflate --replicas=10

Step 9: Watch Karpenter in Action

Check status of your pods in realtime.

kubectl get pods -w

In another terminal, execute following command to check your cluster nodes in realtime.

kubectl get nodes -w

You’ll see new nodes created automatically.

Check logs:

kubectl logs -n karpenter -l app.kubernetes.io/name=karpenter

Verification

Verify that new nodes are provisioned by the Karpenter:

kubectl get nodes -o wide

Check pod placement:

kubectl describe pod <pod-name>

Look for:

Node assignment
No scheduling failures

Cleanup Test

To cleanup test, Scale down your deployment to 0 replicas.

kubectl scale deployment inflate --replicas=0

Wait ~30 seconds (based on ttlSecondsAfterEmpty).

Verify node termination:

kubectl get nodes

Troubleshooting

Pods Stuck Pending

Check:

kubectl describe pod <pod-name>

Common causes:

No matching instance types
Subnets not tagged
Security group mismatch

Karpenter Not Launching Nodes

Check logs:

kubectl logs -n karpenter deployment/karpenter

Look for:

IAM permission errors
EC2 quota limits
Invalid AMI

IAM Issues

Validate role:

aws iam get-role --role-name KarpenterNodeRole-$CLUSTER_NAME

Check policies attached.

No Subnets Found

Verify tags:

aws ec2 describe-subnets \
  --filters Name=tag:karpenter.sh/discovery,Values=$CLUSTER_NAME

Best Practices

Keep Provisioners Narrow

Don’t allow every instance type.

values: ["m5.large", "m5.xlarge"]

Reduces cost unpredictability.

Use Multiple Provisioners

Separate workloads:

CPU-heavy
Memory-heavy
Spot vs On-demand

Enable Spot Instances

Add:

requirements:
  - key: "karpenter.sh/capacity-type"
    operator: In
    values: ["spot"]

Set Resource Limits

Avoid runaway scaling:

limits:
  resources:
    cpu: 500

Use Consolidation (Newer Versions)

Reduces cost by replacing inefficient nodes.

Common Mistakes

Forgetting subnet/security group tags
Allowing too many instance types (chaotic scaling)
No limits → surprise AWS bill
Ignoring IAM role setup (most failures here)
Using default configs in production

Real-World Pattern

Typical production setup:

1 provisioner for spot (cheap workloads)
1 provisioner for on-demand (critical workloads)
strict instance families (m5, c5, r5)
CPU/memory limits enforced
consolidation enabled

Production-Grade Karpenter Provisioner Templates

These are hardened templates you can drop into a real cluster. They assume:

Tagged subnets and security groups (karpenter.sh/discovery=$CLUSTER_NAME)
Separate workloads (critical vs batch)
You care about cost control and predictable behavior

We’ll use Karpenter v1+ APIs (NodePool + EC2NodeClass). If you’re still on older Provisioner, upgrade—this is where Karpenter is headed.

If you’re working through Karpenter and want to go deeper on Kubernetes internals and autoscaling patterns, it’s worth keeping a solid reference on hand. Two consistently high-signal options are The Kubernetes Book (updated regularly and widely used by DevOps engineers) and Kubernetes in Action (more detailed, architecture-heavy). Both are practical, hands-on resources that complement what you’re doing here—especially when you start tuning scheduling, scaling, and cluster behavior at scale.

Disclaimer: This section contains affiliate links. If you purchase through these links, a small commission may be earned at no additional cost to you.

Base Building Block: EC2NodeClass (Shared)

This defines how nodes are launched (AMI, subnets, security groups, etc.).

Create KYAML manifest (ec2nodeclass.yaml) by using a text editor:

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2

  role: KarpenterNodeRole-karpenter-demo

  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: karpenter-demo

  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: karpenter-demo

  tags:
    Environment: production
    ManagedBy: karpenter

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 50Gi
        volumeType: gp3
        encrypted: true

Apply manifest to your EKS cluster.

kubectl apply -f ec2nodeclass.yaml

Template 1: On-Demand (Critical Workloads)

Use this for:

APIs
Databases
Anything that cannot be interrupted

Create nodepool-ondemand.yaml:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: ondemand
spec:
  template:
    metadata:
      labels:
        workload: critical
    spec:
      nodeClassRef:
        name: default

      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]

        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5.xlarge", "m5.2xlarge"]

        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-east-1a", "us-east-1b", "us-east-1c"]

  limits:
    cpu: 500

  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 60s

Apply this manifest to your EKS Cluster.

kubectl apply -f nodepool-ondemand.yaml

Why this works:

Restricts to stable instance family (m5)
Multi-AZ spread
Conservative consolidation

Template 2: Spot (Cost-Optimized Workloads)

Use this for:

CI/CD jobs
batch processing
stateless workers

Create nodepool-spot.yaml:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot
spec:
  template:
    metadata:
      labels:
        workload: batch
    spec:
      nodeClassRef:
        name: default

      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]

        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            ["c5.large", "c5.xlarge", "c5.2xlarge",
             "m5.large", "m5.xlarge",
             "r5.large"]

  limits:
    cpu: 1000

  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s

Apply above manifest to your EKS cluster:

kubectl apply -f nodepool-spot.yaml

Why this works:

Diversified instance pool → fewer spot interruptions
Faster consolidation to cut cost

Template 3: GPU Workloads

Use for ML or CUDA workloads.

Create nodepool-gpu.yaml:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gpu
spec:
  template:
    metadata:
      labels:
        workload: gpu
    spec:
      nodeClassRef:
        name: default

      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["g4dn.xlarge", "g5.xlarge"]

  limits:
    cpu: 200

  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 120s

Template 4: Memory-Optimized (Databases, Caches)

Create nodepool-memory.yaml:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: memory
spec:
  template:
    metadata:
      labels:
        workload: memory
    spec:
      nodeClassRef:
        name: default

      requirements:
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["r5.large", "r5.xlarge", "r5.2xlarge"]

  limits:
    cpu: 300

  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 90s

Also Read: How to Create a Kubernetes Network Policy

Scheduling Workloads to the Right NodePool

Use labels + node selectors.

Example deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      nodeSelector:
        workload: critical
      containers:
        - name: api
          image: nginx
          resources:
            requests:
              cpu: "500m"

Add Taints (Production Control)

Prevent accidental scheduling.

Example (spot nodes only):

spec:
  template:
    spec:
      taints:
        - key: spot
          value: "true"
          effect: NoSchedule

Then tolerate:

tolerations:
  - key: "spot"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Enforce Cost Controls (Hard Limits)

Don’t skip this.

limits:
  cpu: 300

Also enforce namespace quotas:

kubectl create quota compute-quota \
  --hard=cpu=100,memory=200Gi \
  -n production

Observability (Minimal Setup)

Check nodepool usage:

kubectl get nodepools

Check provisioning decisions:

kubectl logs -n karpenter deployment/karpenter

Watch node churn:

kubectl get nodes -w

Production Hardening Checklist

Limit instance types (avoid “anything goes”)
Separate spot vs on-demand
Use taints for isolation
Enforce CPU limits per NodePool
Tag everything (cost visibility)
Enable consolidation
Use multi-AZ always
Monitor EC2 quotas

Common Production Pitfalls

Too many instance types → unpredictable cost + scheduling
No workload separation → critical apps land on spot nodes
No limits → runaway scaling
Ignoring consolidation → wasted money
Single AZ config → fragile cluster

Real-World Layout (Reference)

Typical setup:

ondemand → APIs, ingress, core services
spot → batch jobs, CI runners
memory → Redis, Kafka
gpu → ML workloads

All controlled via:

labels
taints
strict instance families

Final Thoughts

Karpenter gives you fine-grained control—but it won’t protect you from bad decisions.

In production, success comes from tight constraints, not open-ended flexibility:

restrict instance families to known, stable types
enforce hard scaling limits
isolate workloads with labels, taints, and dedicated NodePools

Do that consistently, and the system behaves:

scaling stays fast and predictable
infrastructure cost remains controlled
scheduling is clean, with fewer surprises and less churn

FAQs

What is Karpenter in Kubernetes and how is it different from Cluster Autoscaler?

Karpenter provisions nodes directly based on pending pods, without relying on predefined node groups. It’s faster and more flexible than Cluster Autoscaler, which scales existing node groups and is slower due to ASG constraints.

How does Karpenter decide which EC2 instance type to launch?

It evaluates pod resource requests (CPU, memory, constraints) against the NodePool requirements, then selects the smallest compatible instance type from the allowed list. If multiple options exist, it optimizes for availability and cost.

Why are my pods stuck in Pending even with Karpenter installed?

Common causes:
No matching instance types in NodePool
Subnets or security groups not tagged correctly
IAM/IRSA misconfiguration
Resource requests too large for allowed instances
Run kubectl describe pod <name> and check Karpenter logs.

Can Karpenter use Spot instances safely in production?

Yes, but only for fault-tolerant workloads. Use a separate NodePool with diversified instance types and combine with taints/tolerations. Never run critical services on Spot without fallback.

How do I limit Karpenter from over-scaling and increasing AWS costs?

Set hard limits in NodePool:

limits:
cpu: 300

Also restrict instance types and use consolidation to remove underutilized nodes. Without limits, Karpenter will scale aggressively based on demand.

Recommended Trainings

If you are preparing for the [NEW] Ultimate AWS Certified Cloud Practitioner CLF-C02 2025 exam, then Stephane Maarek’s top-rated online course is one of the best investments you can make. Known for his clear teaching style and real-world cloud expertise, Stephane has helped thousands of students pass AWS certifications with confidence.

This updated course covers everything you need to know for the 2025 exam version, including hands-on examples, exam tips, and detailed explanations that simplify even the toughest concepts. Don’t just study—prepare smartly with a course that gives you the edge.

Disclaimer: Some of the links in this post are affiliate links, meaning I may earn a commission if you click through and make a purchase—at no additional cost to you.

Sources

Kubernetes Documentation
Karpenter Documentation
EKS Documentation