<

Kubernetes Monitoring with Grafana Cloud

Table of Contents

Push-Based Kubernetes Monitoring with Grafana Cloud

Introduction

This guide demonstrates how to create a scalable, centralized monitoring solution for your Kubernetes clusters using a push-based architecture implemented through Infrastructure as Code.
We'll use three key components:
  1. Grafana Cloud — an observability platform that stores and visualizes all our telemetry data.
  1. Grafana Alloy (formerly known as Grafana Agent) — a flexible telemetry collection agent designed to discover, collect, process, and forward observability data from various sources:
  1. k8s-monitoring Helm chart — a pre-configured setup for deploying monitoring components in Kubernetes clusters, simplifying the deployment and configuration of Grafana Alloy.
With this architecture, you can collect logs, metrics, traces, and more across multiple Kubernetes clusters and centralize it in one place:
This diagram shows how metrics can be pulled from applications in a Kubernetes cluster and pushed to Grafana Cloud or its self-hosted alternative
This diagram shows how metrics can be pulled from applications in a Kubernetes cluster and pushed to Grafana Cloud or its self-hosted alternative

Setting Up Grafana Cloud

Let's start by configuring Grafana Cloud to receive our monitoring data.

Create Access Policies

Firstly, you need to configure access policies in Grafana Cloud to allow Terraform to manage resources. Access policies define what actions can be performed on which resources in Grafana Cloud. Sign up at Grafana Cloud, then go to “My Account” → “Security” → “Access Policies”, and create terraform-access-policy that grants Terraform the permissions needed to create and manage resources in Grafana Cloud:
For production use, you should define more specific permissions
For production use, you should define more specific permissions

Configure Terraform Provider

Next, we'll set up the Grafana Terraform provider to interact with Grafana Cloud:
Loading code...

Create a Grafana Stack

Now let's create a Grafana stack—this is a logical grouping of services like metrics, logs, traces, etc.:
Loading code...

Create Access Policy for Grafana Alloy

Also, we need to allow Grafana Alloy to send data to our stack. This policy defines the specific permissions (scopes) that Alloy needs to push metrics, logs, and traces:
Loading code...

Expose Connection Details

Now we have all the necessary tokens and URLs to push data from your Kubernetes clusters into Grafana Cloud's stack via Grafana Alloy. These outputs provide the connection details needed for your monitoring setup:
Loading code...

Deploy the k8s-monitoring Helm Chart

With Grafana Cloud configured, we can now set up monitoring in our Kubernetes clusters.
⚠️
To deploy Helm charts with Terraform, you'll need to configure the Helm provider. See the Terraform Helm provider documentation for setup instructions.
📌
While this guide shows deploying the Helm chart via Terraform for simplicity and to illustrate the concept, in production environments you might prefer using Kustomize or ArgoCD for managing Helm releases (these approaches offer better separation of concerns and more granular control over Kubernetes resources).

Understanding Grafana Alloy

Grafana Alloy uses a configuration language called River (similar to HCL syntax in Terraform) to define how telemetry data is collected and processed. While you could manually configure it as shown below, this quickly becomes complex for production environments:
Loading code...
Here's how you would deploy such a configuration using Terraform:
Loading code...
This approach above can quickly become tedious and time-consuming to write and maintain. That's why, instead of writing this complex configuration by hand, we'll use the k8s-monitoring Helm chart.

What the k8s-monitoring Helm Chart Deploys

The k8s-monitoring chart deploys several components to provide comprehensive monitoring of your Kubernetes cluster like:
  • Prometheus for metrics collection.
  • Loki for log collection.
  • Tempo for distributed tracing.
  • Pyroscope for continuous profiling.
  • Core Kubernetes monitoring components like:
    • kube-state-metrics for exposing K8s object state metrics to show cluster state.
    • node-exporter for hardware metrics.
    • kubelet for exposing node and container runtime metrics.
    • cAdvisor for container resource usage metrics like CPU, memory, network, and disk.
and even more (check out their GitHub repo).
Examples for the k8s-monitoring Helm chart might look like:
Loading code...

Deploying with Terraform

Here's how to deploy the chart using Terraform:
Loading code...
After deploying, you'll see several pods in your monitoring namespace:
Loading code...

Customizing Your Monitoring Pipeline

When implementing monitoring across different environments and clusters, you'll often need to customize how metrics are collected and labeled. The k8s-monitoring Helm chart allows you to apply these customizations without writing complex River configurations directly. You can implement them by adding extra relabeling rules to your values:
Loading code...
These rules follow the same syntax as River configuration but are applied within the managed Helm chart deployment. This gives you the flexibility of the River language while keeping deployment simple and maintainable.

Managing Costs

Monitoring in the cloud comes with costs, especially when collecting data from large clusters. Here are some strategies to keep expenses under control:
  1. Start small: enable only essential namespaces/components, begin by monitoring only critical applications and infrastructure components.
  1. Monitor usage: regularly check the usage dashboards in Grafana Cloud to understand how much data you're ingesting and which metrics are taking up the most space.
  1. Filter metrics: drop logs or metrics you don't need. Use the metricsTuning settings of the k8s-monitoring chart to limit which metrics are collected:
      Loading code...
      Limit namespaces from which you collect logs:
      Loading code...