Kubernetes Monitoring: Setup Prometheus on K8s

Share on Social Media

For Kubernetes Monitoring: Deploy Prometheus on Kubernetes the right way—fast. This command-first guide shows you how to use the Prometheus Operator with Helm to get full visibility across nodes, pods, and core components. Stop flying blind and fix monitoring gaps before they cost you outages—most setups miss critical steps. Follow along, verify everything works, and lock in production-ready observability now. #centlinux #k8s #prometheus



What you’re actually building

  • Prometheus server running inside Kubernetes
  • Auto-discovery of targets (pods, nodes, services)
  • Metrics scraping via ServiceMonitors/PodMonitors
  • kube-state-metrics + node-exporter for core signals
  • Optional Grafana for visualization

Prerequisites

  • Working Kubernetes cluster (v1.24+ recommended)
  • kubectl configured and pointing to your cluster
  • helm v3 installed
  • Cluster admin privileges

Verify prerequisites by running following commands at Linux CLI:

kubectl version --short
helm version
kubectl get nodes

Important links:

Kubernetes Monitoring
Kubernetes Monitoring

Step 1: Create a dedicated Kubernetes monitoring namespace

Create a dedicated namespace for monitoring to keep Prometheus resources isolated from application workloads.

kubectl create namespace monitoring

This prevents naming conflicts, simplifies RBAC and access control, and makes upgrades or cleanup predictable. In production, separating monitoring is standard practice—it reduces blast radius and keeps your cluster organized when things break.


Step 2: Add Helm repo and update

Add the official Prometheus community Helm repository so you’re pulling maintained, production-grade charts instead of outdated or custom ones.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Updating the repo ensures you install the latest chart versions with current CRDs, fixes, and defaults—this avoids subtle compatibility issues later when deploying the stack.


Step 3: Deploy kube-prometheus-stack

Deploy the kube-prometheus-stack via Helm to install a production-ready monitoring stack in one shot. This includes:

  • Prometheus
  • Alertmanager
  • node-exporter
  • kube-state-metrics
  • CRDs (ServiceMonitor, PodMonitor, etc.)

This step wires up automatic scraping of core cluster components out of the box, so you’re not manually defining targets. Once deployed, Kubernetes resources are immediately observable with minimal configuration.

helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring

Check the Kubernetes pods now:

kubectl get pods -n monitoring -w

You should see pods like:

  • prometheus-prometheus-kube-prometheus-prometheus-0
  • prometheus-kube-state-metrics
  • prometheus-node-exporter-*

Step 4: Expose Prometheus UI

Expose the Prometheus UI using a temporary port-forward for quick, secure access without modifying cluster networking. This keeps things simple during setup and avoids exposing the service externally.

kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n monitoring

Once the tunnel is active, you can access the Prometheus dashboard locally, validate targets, and run queries before deciding on a permanent exposure method like Ingress or LoadBalancer.

Open following in Web Browser to access Prometheus Web UI.

http://localhost:9090

Step 5: Validate targets (critical)

This is the moment of truth. Prometheus is only useful if it’s actually scraping targets. Head to Status → Targets in the UI and confirm all core components (kube-apiserver, kubelet, node-exporter, kube-state-metrics) show UP. Anything DOWN means broken discovery, bad labels, or unreachable endpoints. Don’t move forward until this is clean—every downstream dashboard and alert depends on it.

    Or you can also execute following kubectl command at CLI to verify the status of the service monitors:

    kubectl get servicemonitors -n monitoring

    Step 6: Configure a custom application scrape

    At this stage, you wire Prometheus into your application so it can actually collect useful metrics. In Kubernetes, this is done using a ServiceMonitor, which tells Prometheus where to scrape and how often. The key is consistency: your app must expose a /metrics endpoint, your Service must have the correct labels, and the ServiceMonitor must match those labels exactly. If any of those don’t line up, Prometheus won’t discover your target—no errors, just silence.

    Example app deployment (with metrics endpoint)

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: demo-app
      namespace: default
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: demo-app
      template:
        metadata:
          labels:
            app: demo-app
        spec:
          containers:
          - name: demo-app
            image: prom/prometheus-example-app
            ports:
            - containerPort: 8080

    Apply the manifest to Kubernetes Cluster:

    kubectl apply -f demo-app.yaml

    Expose the app

    apiVersion: v1
    kind: Service
    metadata:
      name: demo-app
      namespace: default
      labels:
        app: demo-app
    spec:
      selector:
        app: demo-app
      ports:
      - port: 80
        targetPort: 8080
        name: http
    kubectl apply -f demo-service.yaml

    Create ServiceMonitor

    This is how Prometheus discovers targets in Kubernetes.

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: demo-app-monitor
      namespace: monitoring
      labels:
        release: prometheus
    spec:
      selector:
        matchLabels:
          app: demo-app
      namespaceSelector:
        matchNames:
          - default
      endpoints:
      - port: http
        interval: 15s

    Apply:

    kubectl apply -f demo-servicemonitor.yaml

    Step 7: Verify custom metrics

    Check targets again:

    kubectl get servicemonitor -n monitoring

    In UI → Targets, look for demo-app.

    Query:

    http_requests_total

    If metrics appear, you’re done.


    Step 8: Configure persistent storage (production requirement)

    By default, Prometheus may use ephemeral storage. Fix that.

    Create values file:

    prometheus:
      prometheusSpec:
        storageSpec:
          volumeClaimTemplate:
            spec:
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 20Gi

    Upgrade release:

    helm upgrade prometheus \ 
    prometheus-community/kube-prometheus-stack \
    -n monitoring -f values.yaml

    Step 9: Enable retention tuning

    Prometheus retention controls how long metrics are stored before being deleted, and getting this wrong will either burn disk or wipe useful data too early. Set retention based on your storage capacity and troubleshooting needs—7 days is a practical baseline for most clusters. Shorter retention reduces disk pressure and improves performance, while longer retention is useful for trend analysis but requires more storage. Always pair retention tuning with persistent volumes, otherwise you’re just managing data that won’t survive a pod restart.

    Add:

    prometheus:
      prometheusSpec:
        retention: 7d

    Apply via Helm upgrade.


    Already included in the stack.

    Get admin password:

    kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 --decode

    Port-forward:

    kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring

    Open:

    http://localhost:3000

    Import dashboards:

    • Kubernetes cluster monitoring
    • Node exporter

    Verification checklist

    Run through this before calling it “done”:

    kubectl get pods -n monitoring
    kubectl get servicemonitors -A
    kubectl get podmonitors -A
    kubectl get pvc -n monitoring

    In Prometheus UI:

    • Targets all UP
    • Queries return data
    • No scrape errors

    Troubleshooting

    1. Targets missing

    Check label mismatch:

    kubectl get svc -n default --show-labels
    kubectl describe servicemonitor demo-app-monitor -n monitoring

    Mismatch = no discovery.


    2. ServiceMonitor not picked up

    Ensure label matches Helm release:

    labels:
      release: prometheus

    This is mandatory unless you override selector config.


    3. Metrics endpoint unreachable

    Test manually:

    kubectl port-forward svc/demo-app 8080:80
    curl http://localhost:8080/metrics

    If this fails, Prometheus won’t scrape it either.


    4. Prometheus pod crashing

    Check logs:

    kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0

    Common causes:

    • Bad storage config
    • Invalid flags
    • OOMKilled (increase memory)

    5. High memory usage

    Prometheus is not cheap.

    Mitigate:

    • Reduce scrape interval
    • Limit retention
    • Drop unused metrics via relabeling

    Best practices (production)

    • Use persistent storage (non-negotiable)
    • Set resource limits:
    resources:
      requests:
        memory: 2Gi
      limits:
        memory: 4Gi
    • Keep scrape intervals sane (15s–30s)
    • Avoid scraping everything blindly
    • Use federation or Thanos for scaling
    • Secure endpoints (RBAC + NetworkPolicies)
    • Backup Prometheus data if it’s critical

    Common mistakes

    • Forgetting release: prometheus label → no scraping
    • Exposing metrics on wrong port name
    • Using ClusterIP services without matching selectors
    • No persistence → data loss on restart
    • Over-scraping → cluster slowdown

    Real-world deployment pattern

    Typical setup in production:

    • Prometheus Operator (this guide)
    • Grafana dashboards for teams
    • Alertmanager wired to Slack/PagerDuty
    • Long-term storage via Thanos
    • ServiceMonitors per microservice

    Final takeaway

    Prometheus on Kubernetes isn’t hard—but it’s strict about labels and discovery. Most failures come down to mismatched selectors or missing ServiceMonitors.

    Get those right, and everything else is just tuning.


    FAQs

    How do I know if Prometheus is scraping my Kubernetes pods correctly?

    Check the Prometheus UI under Status → Targets and confirm your pod endpoints are listed as UP. From CLI, validate ServiceMonitors and labels:

    kubectl get servicemonitors -A
    kubectl describe servicemonitor <name> -n monitoring

    Most failures come from label mismatches or wrong port names.

    What is the difference between ServiceMonitor and PodMonitor in Kubernetes?

    ServiceMonitor scrapes metrics via a Kubernetes Service (stable, production-friendly). PodMonitor scrapes pods directly (useful for dynamic or headless workloads). In most real-world setups, stick with ServiceMonitor unless you have a specific need.

    Why is Prometheus not discovering my application metrics?

    Common causes:

    Missing release: prometheus label

    Service labels don’t match ServiceMonitor selectors
    Metrics endpoint not exposed or wrong port name
    Validate quickly:

    kubectl get svc --show-labels
    kubectl describe servicemonitor <name> -n monitoring

    How much storage does Prometheus need in Kubernetes?

    Depends on scrape interval and retention. A typical baseline:

    15s scrape interval
    7-day retention
    ~10–30 GB per cluster

    Tune retention and storage together:

    retention: 7d
    storage: 20Gi

    Can Prometheus handle large Kubernetes clusters in production?

    Not by itself at scale. For large clusters:

    – Use federation or remote_write
    – Add Thanos or Cortex for long-term storage and HA
    – Reduce scrape noise (don’t collect everything)
    – Single-instance Prometheus works fine for small-to-medium clusters, but scaling requires architecture changes.


    If you’re eager to kickstart your journey into cloud-native technologies, Kubernetes for the Absolute Beginners – Hands-on by Mumshad Mannambeth is the perfect course for you. Designed for complete beginners, this course breaks down complex concepts into easy-to-follow, hands-on lessons that will get you comfortable deploying, managing, and scaling applications on Kubernetes.

    Whether you’re a developer, sysadmin, or IT enthusiast, this course provides the practical skills needed to confidently work with Kubernetes in real-world scenarios. By enrolling through the links in this post, you also support this website at no extra cost to you.

    Disclaimer: Some of the links in this post are affiliate links. This means I may earn a small commission if you make a purchase through these links, at no additional cost to you.


    Leave a Reply