For Kubernetes Monitoring: Deploy Prometheus on Kubernetes the right way—fast. This command-first guide shows you how to use the Prometheus Operator with Helm to get full visibility across nodes, pods, and core components. Stop flying blind and fix monitoring gaps before they cost you outages—most setups miss critical steps. Follow along, verify everything works, and lock in production-ready observability now. #centlinux #k8s #prometheus
Table of Contents
What you’re actually building
- Prometheus server running inside Kubernetes
- Auto-discovery of targets (pods, nodes, services)
- Metrics scraping via ServiceMonitors/PodMonitors
- kube-state-metrics + node-exporter for core signals
- Optional Grafana for visualization
Prerequisites
- Working Kubernetes cluster (v1.24+ recommended)
kubectlconfigured and pointing to your clusterhelmv3 installed- Cluster admin privileges
Verify prerequisites by running following commands at Linux CLI:
kubectl version --short
helm version
kubectl get nodesImportant links:

Step 1: Create a dedicated Kubernetes monitoring namespace
Create a dedicated namespace for monitoring to keep Prometheus resources isolated from application workloads.
kubectl create namespace monitoringThis prevents naming conflicts, simplifies RBAC and access control, and makes upgrades or cleanup predictable. In production, separating monitoring is standard practice—it reduces blast radius and keeps your cluster organized when things break.
Step 2: Add Helm repo and update
Add the official Prometheus community Helm repository so you’re pulling maintained, production-grade charts instead of outdated or custom ones.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo updateUpdating the repo ensures you install the latest chart versions with current CRDs, fixes, and defaults—this avoids subtle compatibility issues later when deploying the stack.
Step 3: Deploy kube-prometheus-stack
Deploy the kube-prometheus-stack via Helm to install a production-ready monitoring stack in one shot. This includes:
- Prometheus
- Alertmanager
- node-exporter
- kube-state-metrics
- CRDs (ServiceMonitor, PodMonitor, etc.)
This step wires up automatic scraping of core cluster components out of the box, so you’re not manually defining targets. Once deployed, Kubernetes resources are immediately observable with minimal configuration.
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoringCheck the Kubernetes pods now:
kubectl get pods -n monitoring -wYou should see pods like:
prometheus-prometheus-kube-prometheus-prometheus-0prometheus-kube-state-metricsprometheus-node-exporter-*
Step 4: Expose Prometheus UI
Expose the Prometheus UI using a temporary port-forward for quick, secure access without modifying cluster networking. This keeps things simple during setup and avoids exposing the service externally.
kubectl port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 -n monitoringOnce the tunnel is active, you can access the Prometheus dashboard locally, validate targets, and run queries before deciding on a permanent exposure method like Ingress or LoadBalancer.
Open following in Web Browser to access Prometheus Web UI.
http://localhost:9090Step 5: Validate targets (critical)
This is the moment of truth. Prometheus is only useful if it’s actually scraping targets. Head to Status → Targets in the UI and confirm all core components (kube-apiserver, kubelet, node-exporter, kube-state-metrics) show UP. Anything DOWN means broken discovery, bad labels, or unreachable endpoints. Don’t move forward until this is clean—every downstream dashboard and alert depends on it.
Or you can also execute following kubectl command at CLI to verify the status of the service monitors:
kubectl get servicemonitors -n monitoringStep 6: Configure a custom application scrape
At this stage, you wire Prometheus into your application so it can actually collect useful metrics. In Kubernetes, this is done using a ServiceMonitor, which tells Prometheus where to scrape and how often. The key is consistency: your app must expose a /metrics endpoint, your Service must have the correct labels, and the ServiceMonitor must match those labels exactly. If any of those don’t line up, Prometheus won’t discover your target—no errors, just silence.
Example app deployment (with metrics endpoint)
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-app
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: demo-app
template:
metadata:
labels:
app: demo-app
spec:
containers:
- name: demo-app
image: prom/prometheus-example-app
ports:
- containerPort: 8080Apply the manifest to Kubernetes Cluster:
kubectl apply -f demo-app.yamlExpose the app
apiVersion: v1
kind: Service
metadata:
name: demo-app
namespace: default
labels:
app: demo-app
spec:
selector:
app: demo-app
ports:
- port: 80
targetPort: 8080
name: httpkubectl apply -f demo-service.yamlCreate ServiceMonitor
This is how Prometheus discovers targets in Kubernetes.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: demo-app-monitor
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
app: demo-app
namespaceSelector:
matchNames:
- default
endpoints:
- port: http
interval: 15sApply:
kubectl apply -f demo-servicemonitor.yamlStep 7: Verify custom metrics
Check targets again:
kubectl get servicemonitor -n monitoringIn UI → Targets, look for demo-app.
Query:
http_requests_totalIf metrics appear, you’re done.
Step 8: Configure persistent storage (production requirement)
By default, Prometheus may use ephemeral storage. Fix that.
Create values file:
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20GiUpgrade release:
helm upgrade prometheus \
prometheus-community/kube-prometheus-stack \
-n monitoring -f values.yamlStep 9: Enable retention tuning
Prometheus retention controls how long metrics are stored before being deleted, and getting this wrong will either burn disk or wipe useful data too early. Set retention based on your storage capacity and troubleshooting needs—7 days is a practical baseline for most clusters. Shorter retention reduces disk pressure and improves performance, while longer retention is useful for trend analysis but requires more storage. Always pair retention tuning with persistent volumes, otherwise you’re just managing data that won’t survive a pod restart.
Add:
prometheus:
prometheusSpec:
retention: 7dApply via Helm upgrade.
Step 10: Install Grafana (optional but recommended)
Already included in the stack.
Get admin password:
kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 --decodePort-forward:
kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoringOpen:
http://localhost:3000Import dashboards:
- Kubernetes cluster monitoring
- Node exporter
Verification checklist
Run through this before calling it “done”:
kubectl get pods -n monitoring
kubectl get servicemonitors -A
kubectl get podmonitors -A
kubectl get pvc -n monitoringIn Prometheus UI:
- Targets all UP
- Queries return data
- No scrape errors
Troubleshooting
1. Targets missing
Check label mismatch:
kubectl get svc -n default --show-labels
kubectl describe servicemonitor demo-app-monitor -n monitoringMismatch = no discovery.
2. ServiceMonitor not picked up
Ensure label matches Helm release:
labels:
release: prometheusThis is mandatory unless you override selector config.
3. Metrics endpoint unreachable
Test manually:
kubectl port-forward svc/demo-app 8080:80
curl http://localhost:8080/metricsIf this fails, Prometheus won’t scrape it either.
4. Prometheus pod crashing
Check logs:
kubectl logs -n monitoring prometheus-prometheus-kube-prometheus-prometheus-0Common causes:
- Bad storage config
- Invalid flags
- OOMKilled (increase memory)
5. High memory usage
Prometheus is not cheap.
Mitigate:
- Reduce scrape interval
- Limit retention
- Drop unused metrics via relabeling
Best practices (production)
- Use persistent storage (non-negotiable)
- Set resource limits:
resources:
requests:
memory: 2Gi
limits:
memory: 4Gi- Keep scrape intervals sane (15s–30s)
- Avoid scraping everything blindly
- Use federation or Thanos for scaling
- Secure endpoints (RBAC + NetworkPolicies)
- Backup Prometheus data if it’s critical
Common mistakes
- Forgetting
release: prometheuslabel → no scraping - Exposing metrics on wrong port name
- Using ClusterIP services without matching selectors
- No persistence → data loss on restart
- Over-scraping → cluster slowdown
Real-world deployment pattern
Typical setup in production:
- Prometheus Operator (this guide)
- Grafana dashboards for teams
- Alertmanager wired to Slack/PagerDuty
- Long-term storage via Thanos
- ServiceMonitors per microservice
Final takeaway
Prometheus on Kubernetes isn’t hard—but it’s strict about labels and discovery. Most failures come down to mismatched selectors or missing ServiceMonitors.
Get those right, and everything else is just tuning.
FAQs
How do I know if Prometheus is scraping my Kubernetes pods correctly?
Check the Prometheus UI under Status → Targets and confirm your pod endpoints are listed as UP. From CLI, validate ServiceMonitors and labels:
kubectl get servicemonitors -A kubectl describe servicemonitor <name> -n monitoring
Most failures come from label mismatches or wrong port names.
What is the difference between ServiceMonitor and PodMonitor in Kubernetes?
ServiceMonitor scrapes metrics via a Kubernetes Service (stable, production-friendly). PodMonitor scrapes pods directly (useful for dynamic or headless workloads). In most real-world setups, stick with ServiceMonitor unless you have a specific need.
Why is Prometheus not discovering my application metrics?
Common causes:
Missing release: prometheus label
Service labels don’t match ServiceMonitor selectors
Metrics endpoint not exposed or wrong port name
Validate quickly:
kubectl get svc --show-labels
kubectl describe servicemonitor <name> -n monitoring
How much storage does Prometheus need in Kubernetes?
Depends on scrape interval and retention. A typical baseline:
15s scrape interval
7-day retention
~10–30 GB per cluster
Tune retention and storage together:
retention: 7d
storage: 20Gi
Can Prometheus handle large Kubernetes clusters in production?
Not by itself at scale. For large clusters:
– Use federation or remote_write
– Add Thanos or Cortex for long-term storage and HA
– Reduce scrape noise (don’t collect everything)
– Single-instance Prometheus works fine for small-to-medium clusters, but scaling requires architecture changes.
Recommended Courses
If you’re eager to kickstart your journey into cloud-native technologies, “Kubernetes for the Absolute Beginners – Hands-on” by Mumshad Mannambeth is the perfect course for you. Designed for complete beginners, this course breaks down complex concepts into easy-to-follow, hands-on lessons that will get you comfortable deploying, managing, and scaling applications on Kubernetes.
Whether you’re a developer, sysadmin, or IT enthusiast, this course provides the practical skills needed to confidently work with Kubernetes in real-world scenarios. By enrolling through the links in this post, you also support this website at no extra cost to you.
Disclaimer: Some of the links in this post are affiliate links. This means I may earn a small commission if you make a purchase through these links, at no additional cost to you.

Leave a Reply
You must be logged in to post a comment.