Kubernetes Monitoring with Prometheus Grafana and Kubernetes Metrics Server

Monitoring Kubernetes clusters effectively is critical for maintaining application health and ensuring reliable operations. Leveraging Prometheus, Grafana, and the Kubernetes Metrics Server together provides a powerful monitoring stack for both infrastructure and application metrics.

This post dives into how these components interact, how to deploy them, and best practices for implementing a scalable monitoring solution in Kubernetes environments.

Understanding the Monitoring Components

Prometheus

Prometheus is a robust open-source monitoring and alerting toolkit designed for reliability and scalability in dynamic environments like Kubernetes. It scrapes metrics from various endpoints and stores them as time series data.

Grafana

Grafana complements Prometheus by providing a flexible and customizable dashboarding solution. It can visualize metrics from multiple sources, enabling teams to create actionable insights from raw data.

Kubernetes Metrics Server

The Kubernetes Metrics Server collects resource usage metrics (CPU, memory) from Kubelets and exposes them via the Kubernetes API. It is essential for autoscaling and lightweight monitoring scenarios.

Deploying the Monitoring Stack

Installing Kubernetes Metrics Server

The Metrics Server is often included by default in managed Kubernetes clusters but can be installed manually as follows:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Verify deployment with:

kubectl get deployment metrics-server -n kube-system

Deploying Prometheus and Grafana

A common way to deploy Prometheus and Grafana is using the kube-prometheus-stack Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

This bundle deploys:

Prometheus with predefined scraping targets
Alertmanager for alerting
Grafana with default dashboards
Exporters for node metrics and Kubernetes components

Integrating Kubernetes Metrics Server with Prometheus

Prometheus scrapes metrics using exporters. The kube-state-metrics and node-exporter work alongside the Metrics Server to provide rich cluster-level and node-level metrics.

kube-state-metrics exposes cluster state metrics (deployments, pods, replicas).
node-exporter exposes OS and hardware-level metrics.

The Metrics Server itself is lightweight and primarily supports Horizontal Pod Autoscaler (HPA), not direct scraping by Prometheus.

Visualizing Metrics with Grafana

Grafana dashboards enable you to visualize key performance indicators such as:

CPU and memory utilization per Pod or Node
Pod restart rates and status
API server request latencies
Network I/O and disk usage

You can import community dashboards specifically designed for Kubernetes from Grafana Labs Dashboards.

Best Practices for Kubernetes Monitoring

Label Consistency: Ensure your services and pods have consistent labels for accurate service discovery.
Scrape Interval Tuning: Optimize Prometheus scrape intervals balancing granularity and resource usage.
Persistent Storage: Configure persistent volumes for Prometheus to avoid data loss.
Alerting Strategy: Define alerting rules to catch issues early without alert fatigue.
Security: Secure Grafana dashboards and Prometheus endpoints using RBAC and authentication proxies.

Troubleshooting Common Issues

Metrics Server shows no data: Check permissions and RBAC roles.
Prometheus missing targets: Verify ServiceMonitor and PodMonitor CRDs.
Grafana dashboards show no data: Confirm data source configuration and query correctness.
High resource consumption: Tune retention periods and scrape intervals.

Conclusion

By combining Prometheus for comprehensive metrics scraping, Grafana for visualization, and the Kubernetes Metrics Server for lightweight resource metrics, teams can establish a powerful monitoring ecosystem. This setup supports autoscaling, proactive alerting, and in-depth analysis—critical for managing complex Kubernetes deployments in production.

Implementing this stack efficiently empowers Kubernetes operators to maintain high availability, optimize resource usage, and quickly diagnose issues, ensuring smooth cluster operations.