Setting Up Monitoring with kube-prometheus Stack on Kubernetes
Monitoring your Kubernetes cluster is essential for maintaining reliability, performance, and availability of your applications. In this guide, I'll walk you through setting up the kube-prometheus stack—a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules that provide a complete monitoring solution for your Kubernetes cluster.
What is kube-prometheus Stack?
The kube-prometheus stack is a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus. It includes:
- Prometheus Operator: A Kubernetes operator for managing Prometheus instances
- Prometheus: The monitoring and alerting system
- Alertmanager: For handling alerts
- Grafana: Visualization and dashboarding
- Node-exporter: For collecting hardware and OS metrics
- kube-state-metrics: For cluster-level metrics
- Various service monitors and pod monitors
Prerequisites
Before we begin, ensure you have:
- A running Kubernetes cluster (1.16+). You can set up one by following my previous post on Kubernetes or using my ansible playbook for Kubernetes.
- Helm 3.x installed
- kubectl configured to communicate with your cluster
- Storage class configured for persistent volumes (optional but recommended). If you don’t have a storage class, you can go through my post on Longhorn.
Installation Steps
We’ll use Helm to deploy the kube-prometheus stack, which makes installation and upgrades much easier.
1. Add the Prometheus Community Helm Repository
First, let’s add the Prometheus community Helm repository:
1
2
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
2. Create a Namespace
Create a dedicated namespace for the monitoring stack:
1
kubectl create namespace monitoring
3. Create Values File
Create a kube-prometheus-values.yaml
file to customize the installation. This file will contain configurations for Prometheus, Grafana, and Alertmanager. We will use this file to set the retention period, storage and grafana admin password.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
namespaceOverride: "monitoring" # Override the namespace for all components
prometheus:
prometheusSpec:
retention: 14d # Retention period for Prometheus data.
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
grafana:
adminPassword: "admin" # Set the Grafana admin password.
4. Deploy kube-prometheus Stack
Now, deploy the kube-prometheus stack using Helm:
1
2
3
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
-f kube-prometheus-values.yaml
5. Verify the Installation
Check that all pods are running:
1
kubectl -n monitoring get pods
You should see pods for Prometheus, Alertmanager, Grafana, and various exporters.
6. Access the Dashboards
By default, the services are deployed with ClusterIP type. To access them, you have several options:
Option 1: Port Forwarding
For quick access, use kubectl port-forwarding:
1
2
3
4
5
6
7
8
# Access Grafana
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Access Prometheus
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090
# Access AlertManager
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093
Option 2: Ingress (Recommended for Production)
For a more permanent solution, set up Ingress. If you have followed my previous post on Traefik Ingress Controller, you can use Traefik as your Ingress controller and it will provide trusted SSL certificates for your services. Here’s a sample Ingress configuration for Grafana using Traefik IngressRoute:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: kube-prometheus-ingressroute
namespace: monitoring
annotations:
kubernetes.io/ingress.class: traefik-external
spec:
entryPoints:
- websecure
routes:
- match: Host(`grafana.k8s.plutolab.live`)
kind: Rule
services:
- name: kube-prometheus-stack-grafana
port: 80
# Optionally you can add prometheus and alertmanager ingress routes
- match: Host(`prometheus.k8s.plutolab.live`)
kind: Rule
services:
- name: kube-prometheus-stack-prometheus
port: 9090
- match: Host(`alertmanager.k8s.plutolab.live`)
kind: Rule
services:
- name: kube-prometheus-stack-alertmanager
port: 9093
tls:
certResolver: letsencrypt # If you're using cert-manager or Traefik's built-in cert resolver
Apply this configuration with:
1
kubectl apply -f kube-prometheus-ingressroute.yaml
7. Login to Grafana
Access Grafana using your browser:
- URL: http://localhost:3000 (if using port forwarding) or https://grafana.k8s.plutolab.live (if using Ingress)
- Username: admin
- Password: admin (or whatever you set in the values.yaml file)
You'll be prompted to change the password on first login for security reasons.
You can access Prometheus and Alertmanager using similar URLs:
- Prometheus: http://localhost:9090 (or https://prometheus.k8s.plutolab.live)
- Alertmanager: http://localhost:9093 (or https://alertmanager.k8s.plutolab.live)
Exploring the Default Dashboards
The kube-prometheus stack comes with a wealth of pre-configured dashboards:
- Kubernetes / Compute Resources / Cluster: Overview of CPU and memory usage across the cluster
- Kubernetes / Compute Resources / Namespace (Pods): Resource usage by namespace
- Kubernetes / Compute Resources / Node (Pods): Resource usage by node
- Node Exporter / Nodes: Detailed node metrics
- Kubernetes / Networking / Cluster: Network usage metrics
- Kubernetes / Persistent Volumes: Storage usage metrics
To find these dashboards, click on the dashboard icon in the Grafana sidebar and browse the folders.
Adding Custom Prometheus Rules
You can add custom Prometheus alerting rules by creating a PrometheusRule resource:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: custom-rules
namespace: monitoring
labels:
release: prometheus
spec:
groups:
- name: custom.rules
rules:
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: High Memory Usage
description: "Node {{ $labels.instance }} has high memory usage: {{ $value }}%"
Apply this configuration:
1
kubectl apply -f custom-rules.yaml
Monitoring Your Applications
To monitor your applications, you can create ServiceMonitor
or PodMonitor
resources. Here’s an example ServiceMonitor for an application that exposes Prometheus metrics on /metrics
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app-monitor
namespace: monitoring
labels:
release: prometheus # Important: this label enables discovery by Prometheus
spec:
selector:
matchLabels:
app: my-app
namespaceSelector:
matchNames:
- my-app-namespace
endpoints:
- port: metrics
interval: 15s
path: /metrics
Apply this configuration:
1
kubectl apply -f my-app-servicemonitor.yaml
Conclusion
The kube-prometheus stack provides a comprehensive monitoring solution for Kubernetes clusters. With its pre-configured dashboards, alerting rules, and exporters, you can quickly gain visibility into your cluster’s health and performance.
Remember that proper monitoring is not just about installing tools but also about understanding the metrics and setting appropriate thresholds and alerts that align with your service level objectives (SLOs) and service level agreements (SLAs).
Next Steps
Now that you have your monitoring stack set up, consider:
- Creating custom dashboards specific to your applications
- Setting up alerting integrations with Slack, PagerDuty, or other notification systems
- Implementing log aggregation with tools like Loki to complement your metrics
- Exploring service mesh integrations if you’re using Istio or Linkerd
Let me know in the comments if you have any questions or would like me to dive deeper into any particular aspect of Kubernetes monitoring!
Happy monitoring!