Kubernetes Monitoring: Setup Prometheus on K8s

Share on Social Media

For Kubernetes Monitoring: Deploy Prometheus on Kubernetes the right way—fast. This command-first guide shows you how to use the Prometheus Operator with Helm to get full visibility across nodes, pods, and core components. Stop flying blind and fix monitoring gaps before they cost you outages—most setups miss critical steps. Follow along, verify everything works, and lock in production-ready observability now. #centlinux #k8s #prometheus



What you’re actually building

  • Prometheus server running inside Kubernetes
  • Auto-discovery of targets (pods, nodes, services)
  • Metrics scraping via ServiceMonitors/PodMonitors
  • kube-state-metrics + node-exporter for core signals
  • Optional Grafana for visualization

Prerequisites

  • Working Kubernetes cluster (v1.24+ recommended)
  • kubectl configured and pointing to your cluster
  • helm v3 installed
  • Cluster admin privileges

Verify prerequisites by running following commands at Linux CLI:

kubectl version --short
helm version
kubectl get nodes

Important links:

Kubernetes Monitoring
Kubernetes Monitoring

Step 1: Create a dedicated Kubernetes monitoring namespace

Create a dedicated namespace for monitoring to keep Prometheus resources isolated from application workloads.

kubectl create namespace monitoring

This prevents naming conflicts, simplifies RBAC and access control, and makes upgrades or cleanup predictable. In production, separating monitoring is standard practice—it reduces blast radius and keeps your cluster organized when things break.


Step 2: Add Helm repo and update

Add the official Prometheus community Helm repository so you’re pulling maintained, production-grade charts instead of outdated or custom ones.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Updating the repo ensures you install the latest chart versions with current CRDs, fixes, and defaults—this avoids subtle compatibility issues later when deploying the stack.


Step 3: Deploy kube-prometheus-stack

Deploy the kube-prometheus-stack via Helm to install a production-ready monitoring stack in one shot. This includes:

  • Prometheus
  • Alertmanager
  • node-exporter
  • kube-state-metrics
  • CRDs (ServiceMonitor, PodMonitor, etc.)

This step wires up automatic scraping of core cluster components out of the box, so you’re not manually defining targets. Once deployed, Kubernetes resources are immediately observable with minimal configuration.

helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring

Check the Kubernetes pods now:

kubectl get pods -n monitoring -w

You should see pods like:

  • prometheus-prometheus-kube-prometheus-prometheus-0
  • prometheus-kube-state-metrics
  • prometheus-node-exporter-*

Read Also: How to Use Karpenter: Kubernetes Autoscaler


Step 4: Expose Prometheus UI

Expose the Prometheus UI using a temporary port-forward for quick, secure access without modifying cluster networking. This keeps things simple during setup and avoids exposing the service externally.

kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n monitoring

Once the tunnel is active, you can access the Prometheus dashboard locally, validate targets, and run queries before deciding on a permanent exposure method like Ingress or LoadBalancer.

Open following in Web Browser to access Prometheus Web UI.

http://localhost:9090

Step 5: Validate targets (critical)

This is the moment of truth. Prometheus is only useful if it’s actually scraping targets. Head to Status → Targets in the UI and confirm all core components (kube-apiserver, kubelet, node-exporter, kube-state-metrics) show UP. Anything DOWN means broken discovery, bad labels, or unreachable endpoints. Don’t move forward until this is clean—every downstream dashboard and alert depends on it.

Or you can also execute following kubectl command at CLI to verify the status of the service monitors:

kubectl get servicemonitors -n monitoring

Step 6: Configure a custom application scrape

At this stage, you wire Prometheus into your application so it can actually collect useful metrics. In Kubernetes, this is done using a ServiceMonitor, which tells Prometheus where to scrape and how often. The key is consistency: your app must expose a /metrics endpoint, your Service must have the correct labels, and the ServiceMonitor must match those labels exactly. If any of those don’t line up, Prometheus won’t discover your target—no errors, just silence.

Example app deployment (with metrics endpoint)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-app
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo-app
  template:
    metadata:
      labels:
        app: demo-app
    spec:
      containers:
      - name: demo-app
        image: prom/prometheus-example-app
        ports:
        - containerPort: 8080

Apply the manifest to Kubernetes Cluster:

kubectl apply -f demo-app.yaml

Expose the app

apiVersion: v1
kind: Service
metadata:
  name: demo-app
  namespace: default
  labels:
    app: demo-app
spec:
  selector:
    app: demo-app
  ports:
  - port: 80
    targetPort: 8080
    name: http
kubectl apply -f demo-service.yaml

Create ServiceMonitor

This is how Prometheus discovers targets in Kubernetes.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: demo-app-monitor
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: demo-app
  namespaceSelector:
    matchNames:
      - default
  endpoints:
  - port: http
    interval: 15s

Apply:

kubectl apply -f demo-servicemonitor.yaml

Step 7: Verify custom metrics

Check targets again:

kubectl get servicemonitor -n monitoring

In UI → Targets, look for demo-app.

Query:

http_requests_total

If metrics appear, you’re done.


Step 8: Configure persistent storage (production requirement)

By default, Prometheus may use ephemeral storage. Fix that.

Create values file:

prometheus:
  prometheusSpec:
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi

Upgrade release:

helm upgrade prometheus \ 
prometheus-community/kube-prometheus-stack \
-n monitoring -f values.yaml

Step 9: Enable retention tuning

Prometheus retention controls how long metrics are stored before being deleted, and getting this wrong will either burn disk or wipe useful data too early. Set retention based on your storage capacity and troubleshooting needs—7 days is a practical baseline for most clusters. Shorter retention reduces disk pressure and improves performance, while longer retention is useful for trend analysis but requires more storage. Always pair retention tuning with persistent volumes, otherwise you’re just managing data that won’t survive a pod restart.

Add:

prometheus:
  prometheusSpec:
    retention: 7d

Apply via Helm upgrade.


Already included in the stack.

Get admin password:

kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 --decode

Port-forward:

kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring

Open:

http://localhost:3000

Import dashboards:

  • Kubernetes cluster monitoring
  • Node exporter

Verification checklist

Run through this before calling it “done”:

kubectl get pods -n monitoring
kubectl get servicemonitors -A
kubectl get podmonitors -A
kubectl get pvc -n monitoring

In Prometheus UI:

  • Targets all UP
  • Queries return data
  • No scrape errors

Troubleshooting

1. Targets missing

Check label mismatch:

kubectl get svc -n default --show-labels
kubectl describe servicemonitor demo-app-monitor -n monitoring

Mismatch = no discovery.


2. ServiceMonitor not picked up

Ensure label matches Helm release:

labels:
  release: prometheus

This is mandatory unless you override selector config.


3. Metrics endpoint unreachable

Test manually:

kubectl port-forward svc/demo-app 8080:80
curl http://localhost:8080/metrics

If this fails, Prometheus won’t scrape it either.


4. Prometheus pod crashing

Check logs:

kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0

Common causes:

  • Bad storage config
  • Invalid flags
  • OOMKilled (increase memory)

5. High memory usage

Prometheus is not cheap.

Mitigate:

  • Reduce scrape interval
  • Limit retention
  • Drop unused metrics via relabeling

Best practices (production)

  • Use persistent storage (non-negotiable)
  • Set resource limits:
resources:
  requests:
    memory: 2Gi
  limits:
    memory: 4Gi
  • Keep scrape intervals sane (15s–30s)
  • Avoid scraping everything blindly
  • Use federation or Thanos for scaling
  • Secure endpoints (RBAC + NetworkPolicies)
  • Backup Prometheus data if it’s critical

Common mistakes

  • Forgetting release: prometheus label → no scraping
  • Exposing metrics on wrong port name
  • Using ClusterIP services without matching selectors
  • No persistence → data loss on restart
  • Over-scraping → cluster slowdown

Real-world deployment pattern

Typical setup in production:

  • Prometheus Operator (this guide)
  • Grafana dashboards for teams
  • Alertmanager wired to Slack/PagerDuty
  • Long-term storage via Thanos
  • ServiceMonitors per microservice

Final takeaway

Prometheus on Kubernetes isn’t hard—but it’s strict about labels and discovery. Most failures come down to mismatched selectors or missing ServiceMonitors.

Get those right, and everything else is just tuning.


FAQs

How do I know if Prometheus is scraping my Kubernetes pods correctly?

Check the Prometheus UI under Status → Targets and confirm your pod endpoints are listed as UP. From CLI, validate ServiceMonitors and labels:

kubectl get servicemonitors -A
kubectl describe servicemonitor <name> -n monitoring

Most failures come from label mismatches or wrong port names.

What is the difference between ServiceMonitor and PodMonitor in Kubernetes?

ServiceMonitor scrapes metrics via a Kubernetes Service (stable, production-friendly). PodMonitor scrapes pods directly (useful for dynamic or headless workloads). In most real-world setups, stick with ServiceMonitor unless you have a specific need.

Why is Prometheus not discovering my application metrics?

Common causes:

Missing release: prometheus label

Service labels don’t match ServiceMonitor selectors
Metrics endpoint not exposed or wrong port name
Validate quickly:

kubectl get svc --show-labels
kubectl describe servicemonitor <name> -n monitoring

How much storage does Prometheus need in Kubernetes?

Depends on scrape interval and retention. A typical baseline:

15s scrape interval
7-day retention
~10–30 GB per cluster

Tune retention and storage together:

retention: 7d
storage: 20Gi

Can Prometheus handle large Kubernetes clusters in production?

Not by itself at scale. For large clusters:

– Use federation or remote_write
– Add Thanos or Cortex for long-term storage and HA
– Reduce scrape noise (don’t collect everything)
– Single-instance Prometheus works fine for small-to-medium clusters, but scaling requires architecture changes.


If you’re eager to kickstart your journey into cloud-native technologies, Kubernetes for the Absolute Beginners – Hands-on by Mumshad Mannambeth is the perfect course for you. Designed for complete beginners, this course breaks down complex concepts into easy-to-follow, hands-on lessons that will get you comfortable deploying, managing, and scaling applications on Kubernetes.

Whether you’re a developer, sysadmin, or IT enthusiast, this course provides the practical skills needed to confidently work with Kubernetes in real-world scenarios. By enrolling through the links in this post, you also support this website at no extra cost to you.

Disclaimer: Some of the links in this post are affiliate links. This means I may earn a small commission if you make a purchase through these links, at no additional cost to you.


Leave a Reply