15 KiB
15 KiB
Grafana Alloy: Zero to Hero Guide
Part 1: Foundation & Core Concepts
What is Grafana Alloy?
Alloy is a unified telemetry agent that collects, processes, and forwards:
- Logs (to Loki)
- Metrics (to Prometheus)
- Traces (to Tempo) - not covered in video but supported
Why use it?
- Replaces: Promtail, node_exporter, cadvisor, Loki Docker plugin
- Single configuration file for everything
- Process/filter data BEFORE storage
- Component-based architecture (Lego blocks for monitoring)
Part 2: Installation (5 Minutes)
Option A: Docker (Recommended for Testing)
# docker-compose.yml
version: '3.8'
services:
alloy:
image: grafana/alloy:latest
container_name: alloy
hostname: your-server-name # ← CRITICAL: Set your actual hostname
command:
- "--server.http.listen-addr=0.0.0.0:12345"
- "--storage.path=/var/lib/alloy/data"
- "--config.file=/etc/alloy/config.alloy"
ports:
- "12345:12345" # Web UI
volumes:
- ./config.alloy:/etc/alloy/config.alloy
- alloy-data:/var/lib/alloy/data
- /var/log:/var/log:ro # For host logs
- /var/run/docker.sock:/var/run/docker.sock:ro # For Docker
- /proc:/proc:ro # For metrics
- /sys:/sys:ro # For metrics
restart: unless-stopped
volumes:
alloy-data:
Option B: Binary (Production)
# Download latest
wget https://github.com/grafana/alloy/releases/latest/download/alloy-linux-amd64
chmod +x alloy-linux-amd64
sudo mv alloy-linux-amd64 /usr/local/bin/alloy
# Create systemd service
sudo nano /etc/systemd/system/alloy.service
Service file:
[Unit]
Description=Grafana Alloy
After=network.target
[Service]
Type=simple
User=alloy
ExecStart=/usr/local/bin/alloy run --config.file=/etc/alloy/config.alloy
Restart=always
[Install]
WantedBy=multi-user.target
Part 3: Your First Configuration (Level 1)
Basic Structure - Understanding Components
// config.alloy
// 1. TARGETS (Where data goes)
loki.write "default" {
endpoint = "http://loki:3100/loki/api/v1/push"
}
prometheus.remote_write "default" {
endpoint = "http://prometheus:9090/api/v1/write"
}
// 2. SOURCES (Where data comes from)
// We'll add these in next steps
Key Concept: Alloy connects sources → processors → targets
Part 4: Collect Host Logs (Level 2)
Replace Promtail - Simple Log Collection
// Add to config.alloy after targets
// File discovery
local.file_match "syslog" {
path_targets = [{
__address__ = "localhost",
__path__ = "/var/log/syslog",
}]
}
// Log reader
loki.source.file "syslog" {
targets = local.file_match.syslog.targets
forward_to = [loki.write.default.receiver]
}
// Convert existing Promtail config:
// alloy convert --source-format=promtail promtail.yaml config.alloy
Test it:
docker-compose up -d
curl http://localhost:12345/health # Should return healthy
Part 5: Collect Host Metrics (Level 3)
Replace node_exporter - System Metrics
// Prometheus config needs this flag:
// --web.enable-remote-write-receiver
prometheus.exporter.unix "node_metrics" {
// Automatically collects CPU, memory, disk, network
}
discovery.relabel "node_metrics" {
targets = prometheus.exporter.unix.node_metrics.targets
rule {
source_labels = ["__address__"]
target_label = "instance"
replacement = constants.hostname // Uses system hostname
}
rule {
target_label = "job"
replacement = "${constants.hostname}-metrics" // Dynamic job name
}
}
prometheus.scrape "node_metrics" {
targets = discovery.relabel.node_metrics.output.targets
forward_to = [prometheus.remote_write.default.receiver]
}
View metrics in Grafana:
- Import dashboard ID
1860(Node Exporter Full) - Filter by
job=your-hostname-metrics
Part 6: Add Processing (Level 4)
Relabeling - Add Custom Labels
// For logs
loki.relabel "add_os_label" {
forward_to = [loki.write.default.receiver]
rule {
target_label = "os"
replacement = constants.os // auto-populates "linux"
}
rule {
target_label = "environment"
replacement = "production"
}
}
// Update log source to use relabeler
loki.source.file "syslog" {
targets = local.file_match.syslog.targets
forward_to = [loki.relabel.add_os_label.receiver] // Changed!
}
Filtering - Drop Unwanted Logs
loki.relabel "filter_logs" {
forward_to = [loki.write.default.receiver]
// Drop DEBUG logs
rule {
source_labels = ["__line__"]
regex = "(?i)debug"
action = "drop"
}
// Keep only ERROR and WARN
rule {
source_labels = ["__line__"]
regex = "(?i)error|warn|fail"
action = "keep"
}
}
Part 7: Docker Monitoring (Level 5)
Collect Container Logs & Metrics
// Docker logs (no plugin needed!)
loki.source.docker "container_logs" {
host = "unix:///var/run/docker.sock"
forward_to = [loki.write.default.receiver]
labels = {
container_name = "{{.Name}}",
image_name = "{{.ImageName}}",
}
}
// Docker metrics
prometheus.exporter.docker "container_metrics" {
host = "unix:///var/run/docker.sock"
}
discovery.relabel "docker_metrics" {
targets = prometheus.exporter.docker.container_metrics.targets
rule {
target_label = "job"
replacement = "${constants.hostname}-docker"
}
}
prometheus.scrape "docker_metrics" {
targets = discovery.relabel.docker_metrics.output.targets
forward_to = [prometheus.remote_write.default.receiver]
}
No Docker Compose changes needed! All containers are automatically discovered.
Part 8: Advanced Scenarios (Level 6)
Multiple Log Sources
// System journal
loki.source.journal "journal" {
forward_to = [loki.write.default.receiver]
// Required in containers
path = "/var/log/journal"
labels = {
job = "${constants.hostname}-journal"
}
}
// Application logs
local.file_match "app_logs" {
path_targets = [{
__address__ = "localhost",
__path__ = "/app/*.log",
app = "myapp", // Custom label
}]
}
loki.source.file "app_logs" {
targets = local.file_match.app_logs.targets
forward_to = [loki.write.default.receiver]
}
Multiple Output Destinations
// Development Loki
loki.write "dev" {
endpoint = "http://loki-dev:3100/loki/api/v1/push"
external_labels = {
environment = "development"
}
}
// Production Loki
loki.write "prod" {
endpoint = "http://loki-prod:3100/loki/api/v1/push"
external_labels = {
environment = "production"
}
}
// Route based on label
loki.relabel "route_logs" {
rule {
source_labels = ["environment"]
regex = "prod"
target_label = "__receiver__"
replacement = "loki.write.prod"
}
rule {
source_labels = ["environment"]
regex = "dev"
target_label = "__receiver__"
replacement = "loki.write.dev"
}
forward_to = [
loki.write.prod.receiver,
loki.write.dev.receiver,
]
}
Part 9: Best Practices
1. Configuration Organization
// 01-targets.alloy - Output destinations
loki.write "default" { /* ... */ }
prometheus.remote_write "default" { /* ... */ }
// 02-system-metrics.alloy - Host metrics
prometheus.exporter.unix "node" { /* ... */ }
// 03-system-logs.alloy - Host logs
local.file_match "logs" { /* ... */ }
// 04-docker.alloy - Container monitoring
loki.source.docker "containers" { /* ... */ }
// Load all
prometheus.remote_write "default" {
endpoint = "http://prometheus:9090/api/v1/write"
}
// Import other files
import.git "configs" {
repository = "https://github.com/your-org/alloy-configs"
path = "*.alloy"
pull_frequency = "5m"
}
2. Label Strategy
// Consistent labeling template
discovery.relabel "standard_labels" {
rule {
target_label = "host"
replacement = constants.hostname
}
rule {
target_label = "region"
replacement = "us-east-1"
}
rule {
target_label = "team"
replacement = "platform"
}
rule {
target_label = "job"
replacement = "${constants.hostname}-${component}"
}
}
3. Buffering & Retry
loki.write "default" {
endpoint = "http://loki:3100/loki/api/v1/push"
// Buffer when Loki is down
max_backoff_period = "5s"
min_backoff_period = "100ms"
max_retries = 10
external_labels = {
agent = "alloy"
}
}
Part 10: Complete Production Example
// === PRODUCTION CONFIG ===
// File: /etc/alloy/config.alloy
// 1. OUTPUTS
loki.write "production" {
endpoint = "http://loki-prod:3100/loki/api/v1/push"
external_labels = {cluster="prod", agent="alloy"}
// Buffering
max_streams = 10000
batch_wait = "1s"
batch_size = 1048576 // 1MB
}
prometheus.remote_write "production" {
endpoint = "http://prometheus-prod:9090/api/v1/write"
external_labels = {cluster="prod"}
// Queue config
queue_config = {
capacity = 2500
max_shards = 200
min_shards = 1
max_samples_per_send = 500
}
}
// 2. COMMON PROCESSING
loki.relabel "common_labels" {
rule {
target_label = "host"
replacement = constants.hostname
}
rule {
target_label = "os"
replacement = constants.os
}
rule {
target_label = "agent"
replacement = "alloy"
}
forward_to = [loki.write.production.receiver]
}
// 3. SYSTEM METRICS
prometheus.exporter.unix {}
discovery.relabel "system_metrics" {
targets = prometheus.exporter.unix.default.targets
rule {
target_label = "job"
replacement = "${constants.hostname}-system"
}
rule {
target_label = "instance"
replacement = constants.hostname
}
}
prometheus.scrape "system" {
targets = discovery.relabel.system_metrics.output.targets
forward_to = [prometheus.remote_write.production.receiver]
}
// 4. SYSTEM LOGS
local.file_match "system_logs" {
path_targets = [
{__path__ = "/var/log/syslog"},
{__path__ = "/var/log/auth.log"},
{__path__ = "/var/log/kern.log"},
]
}
loki.source.file "system_logs" {
targets = local.file_match.system_logs.targets
forward_to = [loki.relabel.common_labels.receiver]
}
// 5. JOURNAL
loki.source.journal "journal" {
path = "/var/log/journal" // Required in containers
forward_to = [loki.relabel.common_labels.receiver]
labels = {
job = "${constants.hostname}-journal"
}
}
// 6. DOCKER
loki.source.docker "containers" {
host = "unix:///var/run/docker.sock"
forward_to = [loki.relabel.common_labels.receiver]
labels = {
job = "${constants.hostname}-docker"
}
}
prometheus.exporter.docker {}
discovery.relabel "docker_metrics" {
targets = prometheus.exporter.docker.default.targets
rule {
target_label = "job"
replacement = "${constants.hostname}-docker"
}
}
prometheus.scrape "docker" {
targets = discovery.relabel.docker_metrics.output.targets
forward_to = [prometheus.remote_write.production.receiver]
}
Part 11: Troubleshooting Cheat Sheet
Common Issues & Fixes:
-
"No metrics in Prometheus"
# Check Prometheus has remote write enabled ps aux | grep prometheus | grep enable-remote-write # Test connection curl -X POST http://prometheus:9090/api/v1/write -
"No logs in Loki"
# Check Alloy web UI http://localhost:12345/graph # Check component health curl http://localhost:12345/-/healthy # View Alloy logs docker logs alloy -
"Journal not working in container"
// Add path to journal source loki.source.journal "journal" { path = "/var/log/journal" // ← THIS LINE } -
"Hostname wrong in metrics"
# In docker-compose.yml alloy: hostname: your-actual-hostname # ← Set explicitly -
Validate config:
alloy check config.alloy alloy run --config.file=config.alloy --dry-run
Debug Commands:
# View component graph
open http://localhost:12345/graph
# Check metrics Alloy generates about itself
curl http://localhost:12345/metrics
# Live tail logs
curl -N http://localhost:12345/api/v0/logs/tail
# Export config
curl http://localhost:12345/api/v0/config
Part 12: Migration Checklist
From Old Stack → Alloy
| Old Component | Alloy Replacement | Action |
|---|---|---|
| Promtail | loki.source.file + local.file_match |
Run alloy convert |
| node_exporter | prometheus.exporter.unix |
Remove node_exporter service |
| cadvisor | prometheus.exporter.docker |
Remove cadvisor container |
| Loki Docker plugin | loki.source.docker |
Remove plugin, keep logging=json |
| Multiple config files | Single config.alloy |
Consolidate into sections |
Step-by-Step Migration:
- Stage 1: Deploy Alloy alongside existing tools
- Stage 2: Compare data between old/new in Grafana
- Stage 3: Route 10% traffic to Alloy
- Stage 4: Full cutover, decommission old tools
Quick Reference Card
Essential Components:
local.file_match- Find log filesloki.source.file- Read log filesloki.source.journal- Read systemd journalloki.source.docker- Read container logsprometheus.exporter.unix- System metricsprometheus.exporter.docker- Container metricsloki.relabel- Process logsdiscovery.relabel- Process metricsprometheus.scrape- Send metricsloki.write- Send logsprometheus.remote_write- Send metrics
Magic Variables:
constants.hostname- System hostnameconstants.os- Operating systemconstants.architecture- CPU archconstants.version- Alloy version
Web UI Endpoints:
:12345/- Homepage:12345/graph- Component visualization:12345/metrics- Self-metrics:12345/-/healthy- Health check:12345/api/v0/config- Current config
Next Steps After Mastery
- Add tracing:
otelcol.*components for OpenTelemetry - Multi-cluster: Use
import.httpfor centralized config - Custom components: Build your own with Go
- Kubernetes: Use Alloy Operator for dynamic discovery
- Alerting: Add
prometheus.relabelfor alert rules
Resources
Remember: Start simple, use the web UI (:12345/graph) to visualize your pipeline, and incrementally add complexity. Alloy's power is in its composability - build your monitoring like Lego!