IntroductionSetting Up Grafana CloudCreate Access PoliciesConfigure Terraform ProviderCreate a Grafana StackCreate Access Policy for Grafana AlloyExpose Connection DetailsDeploy the k8s-monitoring Helm ChartUnderstanding Grafana AlloyWhat the k8s-monitoring Helm Chart DeploysDeploying with TerraformCustomizing Your Monitoring PipelineManaging Costs
Introduction
- This guide demonstrates how to create a scalable, centralized monitoring solution for your Kubernetes clusters using a push-based architecture implemented through Infrastructure as Code.
- We'll use three key components:
- Grafana Cloud — an observability platform that stores and visualizes all our telemetry data.
- Grafana Alloy (formerly known as Grafana Agent) — a flexible telemetry collection agent designed to discover, collect, process, and forward observability data from various sources:
- k8s-monitoring Helm chart — a pre-configured setup for deploying monitoring components in Kubernetes clusters, simplifying the deployment and configuration of Grafana Alloy.
- With this architecture, you can collect logs, metrics, traces, and more across multiple Kubernetes clusters and centralize it in one place:
This diagram shows how metrics can be pulled from applications in a Kubernetes cluster and pushed to Grafana Cloud or its self-hosted alternative
Setting Up Grafana Cloud
- Let's start by configuring Grafana Cloud to receive our monitoring data.
Create Access Policies
- Firstly, you need to configure access policies in Grafana Cloud to allow Terraform to manage resources. Access policies define what actions can be performed on which resources in Grafana Cloud. Sign up at Grafana Cloud, then go to “My Account” → “Security” → “Access Policies”, and create
terraform-access-policy
that grants Terraform the permissions needed to create and manage resources in Grafana Cloud:For production use, you should define more specific permissions
Configure Terraform Provider
- Next, we'll set up the Grafana Terraform provider to interact with Grafana Cloud:Loading code...
Create a Grafana Stack
- Now let's create a Grafana stack—this is a logical grouping of services like metrics, logs, traces, etc.:Loading code...
Create Access Policy for Grafana Alloy
- Also, we need to allow Grafana Alloy to send data to our stack. This policy defines the specific permissions (scopes) that Alloy needs to push metrics, logs, and traces:Loading code...
Expose Connection Details
- Now we have all the necessary tokens and URLs to push data from your Kubernetes clusters into Grafana Cloud's stack via Grafana Alloy. These outputs provide the connection details needed for your monitoring setup:Loading code...
Deploy the k8s-monitoring Helm Chart
- With Grafana Cloud configured, we can now set up monitoring in our Kubernetes clusters.To deploy Helm charts with Terraform, you'll need to configure the Helm provider. See the Terraform Helm provider documentation for setup instructions.While this guide shows deploying the Helm chart via Terraform for simplicity and to illustrate the concept, in production environments you might prefer using Kustomize or ArgoCD for managing Helm releases (these approaches offer better separation of concerns and more granular control over Kubernetes resources).
Understanding Grafana Alloy
- Grafana Alloy uses a configuration language called River (similar to HCL syntax in Terraform) to define how telemetry data is collected and processed. While you could manually configure it as shown below, this quickly becomes complex for production environments:Loading code...Here's how you would deploy such a configuration using Terraform:Loading code...
- This approach above can quickly become tedious and time-consuming to write and maintain. That's why, instead of writing this complex configuration by hand, we'll use the
k8s-monitoring
Helm chart.
What the k8s-monitoring Helm Chart Deploys
- The
k8s-monitoring
chart deploys several components to provide comprehensive monitoring of your Kubernetes cluster like:- Prometheus for metrics collection.
- Loki for log collection.
- Tempo for distributed tracing.
- Pyroscope for continuous profiling.
- Core Kubernetes monitoring components like:
- kube-state-metrics for exposing K8s object state metrics to show cluster state.
- node-exporter for hardware metrics.
- kubelet for exposing node and container runtime metrics.
- cAdvisor for container resource usage metrics like CPU, memory, network, and disk.
and even more (check out their GitHub repo).
- Examplalues for the
k8s-monitoring
Helm chart might look like:Loading code...
Deploying with Terraform
- Here's how to deploy the chart using Terraform: Loading code...
- After deploying, you'll see several pods in your monitoring namespace:Loading code...
Customizing Your Monitoring Pipeline
- When implementing monitoring across different environments and clusters, you'll often need to customize how metrics are collected and labeled. The
k8s-monitoring
Helm chart allows you to apply these customizations without writing complex River configurations directly. You can implement them by adding extra relabeling rules to your values:Loading code...These rules follow the same syntax as River configuration but are applied within the managed Helm chart deployment. This gives you the flexibility of the River language while keeping deployment simple and maintainable.
Managing Costs
- Monitoring in the cloud comes with costs, especially when collecting data from large clusters. Here are some strategies to keep expenses under control:
- Start small: enable only essential namespaces/components, begin by monitoring only critical applications and infrastructure components.
- Monitor usage: regularly check the usage dashboards in Grafana Cloud to understand how much data you're ingesting and which metrics are taking up the most space.
- Filter metrics: drop logs or metrics you don't need.
- Use the
metricsTuning
settings of thek8s-monitoring
chart to limit which metrics are collected:Loading code... - Limit namespaces from which you collect logs:Loading code...
- Use the