Post

Setting Up Monitoring with kube-prometheus Stack on Kubernetes

Monitoring your Kubernetes cluster is essential for maintaining reliability, performance, and availability of your applications. In this guide, I'll walk you through setting up the kube-prometheus stack—a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules that provide a complete monitoring solution for your Kubernetes cluster.

Setting Up Monitoring with kube-prometheus Stack on Kubernetes

What is kube-prometheus Stack?

The kube-prometheus stack is a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus. It includes:

  • Prometheus Operator: A Kubernetes operator for managing Prometheus instances
  • Prometheus: The monitoring and alerting system
  • Alertmanager: For handling alerts
  • Grafana: Visualization and dashboarding
  • Node-exporter: For collecting hardware and OS metrics
  • kube-state-metrics: For cluster-level metrics
  • Various service monitors and pod monitors

Prerequisites

Before we begin, ensure you have:

  • A running Kubernetes cluster (1.16+). You can set up one by following my previous post on Kubernetes or using my ansible playbook for Kubernetes.
  • Helm 3.x installed
  • kubectl configured to communicate with your cluster
  • Storage class configured for persistent volumes (optional but recommended). If you don’t have a storage class, you can go through my post on Longhorn.

Installation Steps

We’ll use Helm to deploy the kube-prometheus stack, which makes installation and upgrades much easier.

1. Add the Prometheus Community Helm Repository

First, let’s add the Prometheus community Helm repository:

1
2
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

2. Create a Namespace

Create a dedicated namespace for the monitoring stack:

1
kubectl create namespace monitoring

3. Create Values File

Create a kube-prometheus-values.yaml file to customize the installation. This file will contain configurations for Prometheus, Grafana, and Alertmanager. We will use this file to set the retention period, storage and grafana admin password.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
namespaceOverride: "monitoring"  # Override the namespace for all components
prometheus:
  prometheusSpec:
    retention: 14d                      # Retention period for Prometheus data.
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
grafana:
  adminPassword: "admin"        # Set the Grafana admin password.

4. Deploy kube-prometheus Stack

Now, deploy the kube-prometheus stack using Helm:

1
2
3
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f kube-prometheus-values.yaml

5. Verify the Installation

Check that all pods are running:

1
kubectl -n monitoring get pods

You should see pods for Prometheus, Alertmanager, Grafana, and various exporters.

6. Access the Dashboards

By default, the services are deployed with ClusterIP type. To access them, you have several options:

Option 1: Port Forwarding

For quick access, use kubectl port-forwarding:

1
2
3
4
5
6
7
8
# Access Grafana
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

# Access Prometheus
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090

# Access AlertManager
kubectl port-forward -n monitoring svc/kube-prometheus-stack-alertmanager 9093:9093

For a more permanent solution, set up Ingress. If you have followed my previous post on Traefik Ingress Controller, you can use Traefik as your Ingress controller and it will provide trusted SSL certificates for your services. Here’s a sample Ingress configuration for Grafana using Traefik IngressRoute:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: kube-prometheus-ingressroute
  namespace: monitoring
  annotations: 
    kubernetes.io/ingress.class: traefik-external
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`grafana.k8s.plutolab.live`)
      kind: Rule
      services:
        - name: kube-prometheus-stack-grafana
          port: 80
    # Optionally you can add prometheus and alertmanager ingress routes
    - match: Host(`prometheus.k8s.plutolab.live`)
      kind: Rule
      services:
        - name: kube-prometheus-stack-prometheus
          port: 9090
    - match: Host(`alertmanager.k8s.plutolab.live`)
      kind: Rule
      services:
        - name: kube-prometheus-stack-alertmanager
          port: 9093
  tls:
    certResolver: letsencrypt  # If you're using cert-manager or Traefik's built-in cert resolver

Apply this configuration with:

1
kubectl apply -f kube-prometheus-ingressroute.yaml

7. Login to Grafana

Access Grafana using your browser:

  • URL: http://localhost:3000 (if using port forwarding) or https://grafana.k8s.plutolab.live (if using Ingress)
  • Username: admin
  • Password: admin (or whatever you set in the values.yaml file)
You'll be prompted to change the password on first login for security reasons.

You can access Prometheus and Alertmanager using similar URLs:

  • Prometheus: http://localhost:9090 (or https://prometheus.k8s.plutolab.live)
  • Alertmanager: http://localhost:9093 (or https://alertmanager.k8s.plutolab.live)

Exploring the Default Dashboards

The kube-prometheus stack comes with a wealth of pre-configured dashboards:

  1. Kubernetes / Compute Resources / Cluster: Overview of CPU and memory usage across the cluster
  2. Kubernetes / Compute Resources / Namespace (Pods): Resource usage by namespace
  3. Kubernetes / Compute Resources / Node (Pods): Resource usage by node
  4. Node Exporter / Nodes: Detailed node metrics
  5. Kubernetes / Networking / Cluster: Network usage metrics
  6. Kubernetes / Persistent Volumes: Storage usage metrics

To find these dashboards, click on the dashboard icon in the Grafana sidebar and browse the folders.

Adding Custom Prometheus Rules

You can add custom Prometheus alerting rules by creating a PrometheusRule resource:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: custom-rules
  namespace: monitoring
  labels:
    release: prometheus
spec:
  groups:
  - name: custom.rules
    rules:
    - alert: HighMemoryUsage
      expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: High Memory Usage
        description: "Node {{ $labels.instance }} has high memory usage: {{ $value }}%"

Apply this configuration:

1
kubectl apply -f custom-rules.yaml

Monitoring Your Applications

To monitor your applications, you can create ServiceMonitor or PodMonitor resources. Here’s an example ServiceMonitor for an application that exposes Prometheus metrics on /metrics:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  namespace: monitoring
  labels:
    release: prometheus  # Important: this label enables discovery by Prometheus
spec:
  selector:
    matchLabels:
      app: my-app
  namespaceSelector:
    matchNames:
    - my-app-namespace
  endpoints:
  - port: metrics
    interval: 15s
    path: /metrics

Apply this configuration:

1
kubectl apply -f my-app-servicemonitor.yaml

Conclusion

The kube-prometheus stack provides a comprehensive monitoring solution for Kubernetes clusters. With its pre-configured dashboards, alerting rules, and exporters, you can quickly gain visibility into your cluster’s health and performance.

Remember that proper monitoring is not just about installing tools but also about understanding the metrics and setting appropriate thresholds and alerts that align with your service level objectives (SLOs) and service level agreements (SLAs).

Next Steps

Now that you have your monitoring stack set up, consider:

  1. Creating custom dashboards specific to your applications
  2. Setting up alerting integrations with Slack, PagerDuty, or other notification systems
  3. Implementing log aggregation with tools like Loki to complement your metrics
  4. Exploring service mesh integrations if you’re using Istio or Linkerd

Let me know in the comments if you have any questions or would like me to dive deeper into any particular aspect of Kubernetes monitoring!

Happy monitoring!

This post is licensed under CC BY 4.0 by the author.