# **Grafana Alloy: Zero to Hero Guide** ## **Part 1: Foundation & Core Concepts** ### **What is Grafana Alloy?** Alloy is a **unified telemetry agent** that collects, processes, and forwards: - **Logs** (to Loki) - **Metrics** (to Prometheus) - **Traces** (to Tempo) - not covered in video but supported **Why use it?** - Replaces: Promtail, node_exporter, cadvisor, Loki Docker plugin - Single configuration file for everything - Process/filter data BEFORE storage - Component-based architecture (Lego blocks for monitoring) --- ## **Part 2: Installation (5 Minutes)** ### **Option A: Docker (Recommended for Testing)** ```yaml # docker-compose.yml version: '3.8' services: alloy: image: grafana/alloy:latest container_name: alloy hostname: your-server-name # ← CRITICAL: Set your actual hostname command: - "--server.http.listen-addr=0.0.0.0:12345" - "--storage.path=/var/lib/alloy/data" - "--config.file=/etc/alloy/config.alloy" ports: - "12345:12345" # Web UI volumes: - ./config.alloy:/etc/alloy/config.alloy - alloy-data:/var/lib/alloy/data - /var/log:/var/log:ro # For host logs - /var/run/docker.sock:/var/run/docker.sock:ro # For Docker - /proc:/proc:ro # For metrics - /sys:/sys:ro # For metrics restart: unless-stopped volumes: alloy-data: ``` ### **Option B: Binary (Production)** ```bash # Download latest wget https://github.com/grafana/alloy/releases/latest/download/alloy-linux-amd64 chmod +x alloy-linux-amd64 sudo mv alloy-linux-amd64 /usr/local/bin/alloy # Create systemd service sudo nano /etc/systemd/system/alloy.service ``` **Service file:** ```ini [Unit] Description=Grafana Alloy After=network.target [Service] Type=simple User=alloy ExecStart=/usr/local/bin/alloy run --config.file=/etc/alloy/config.alloy Restart=always [Install] WantedBy=multi-user.target ``` --- ## **Part 3: Your First Configuration (Level 1)** ### **Basic Structure - Understanding Components** ```alloy // config.alloy // 1. TARGETS (Where data goes) loki.write "default" { endpoint = "http://loki:3100/loki/api/v1/push" } prometheus.remote_write "default" { endpoint = "http://prometheus:9090/api/v1/write" } // 2. SOURCES (Where data comes from) // We'll add these in next steps ``` **Key Concept**: Alloy connects `sources` → `processors` → `targets` --- ## **Part 4: Collect Host Logs (Level 2)** ### **Replace Promtail - Simple Log Collection** ```alloy // Add to config.alloy after targets // File discovery local.file_match "syslog" { path_targets = [{ __address__ = "localhost", __path__ = "/var/log/syslog", }] } // Log reader loki.source.file "syslog" { targets = local.file_match.syslog.targets forward_to = [loki.write.default.receiver] } // Convert existing Promtail config: // alloy convert --source-format=promtail promtail.yaml config.alloy ``` **Test it:** ```bash docker-compose up -d curl http://localhost:12345/health # Should return healthy ``` --- ## **Part 5: Collect Host Metrics (Level 3)** ### **Replace node_exporter - System Metrics** ```alloy // Prometheus config needs this flag: // --web.enable-remote-write-receiver prometheus.exporter.unix "node_metrics" { // Automatically collects CPU, memory, disk, network } discovery.relabel "node_metrics" { targets = prometheus.exporter.unix.node_metrics.targets rule { source_labels = ["__address__"] target_label = "instance" replacement = constants.hostname // Uses system hostname } rule { target_label = "job" replacement = "${constants.hostname}-metrics" // Dynamic job name } } prometheus.scrape "node_metrics" { targets = discovery.relabel.node_metrics.output.targets forward_to = [prometheus.remote_write.default.receiver] } ``` **View metrics in Grafana:** 1. Import dashboard ID `1860` (Node Exporter Full) 2. Filter by `job=your-hostname-metrics` --- ## **Part 6: Add Processing (Level 4)** ### **Relabeling - Add Custom Labels** ```alloy // For logs loki.relabel "add_os_label" { forward_to = [loki.write.default.receiver] rule { target_label = "os" replacement = constants.os // auto-populates "linux" } rule { target_label = "environment" replacement = "production" } } // Update log source to use relabeler loki.source.file "syslog" { targets = local.file_match.syslog.targets forward_to = [loki.relabel.add_os_label.receiver] // Changed! } ``` ### **Filtering - Drop Unwanted Logs** ```alloy loki.relabel "filter_logs" { forward_to = [loki.write.default.receiver] // Drop DEBUG logs rule { source_labels = ["__line__"] regex = "(?i)debug" action = "drop" } // Keep only ERROR and WARN rule { source_labels = ["__line__"] regex = "(?i)error|warn|fail" action = "keep" } } ``` --- ## **Part 7: Docker Monitoring (Level 5)** ### **Collect Container Logs & Metrics** ```alloy // Docker logs (no plugin needed!) loki.source.docker "container_logs" { host = "unix:///var/run/docker.sock" forward_to = [loki.write.default.receiver] labels = { container_name = "{{.Name}}", image_name = "{{.ImageName}}", } } // Docker metrics prometheus.exporter.docker "container_metrics" { host = "unix:///var/run/docker.sock" } discovery.relabel "docker_metrics" { targets = prometheus.exporter.docker.container_metrics.targets rule { target_label = "job" replacement = "${constants.hostname}-docker" } } prometheus.scrape "docker_metrics" { targets = discovery.relabel.docker_metrics.output.targets forward_to = [prometheus.remote_write.default.receiver] } ``` **No Docker Compose changes needed!** All containers are automatically discovered. --- ## **Part 8: Advanced Scenarios (Level 6)** ### **Multiple Log Sources** ```alloy // System journal loki.source.journal "journal" { forward_to = [loki.write.default.receiver] // Required in containers path = "/var/log/journal" labels = { job = "${constants.hostname}-journal" } } // Application logs local.file_match "app_logs" { path_targets = [{ __address__ = "localhost", __path__ = "/app/*.log", app = "myapp", // Custom label }] } loki.source.file "app_logs" { targets = local.file_match.app_logs.targets forward_to = [loki.write.default.receiver] } ``` ### **Multiple Output Destinations** ```alloy // Development Loki loki.write "dev" { endpoint = "http://loki-dev:3100/loki/api/v1/push" external_labels = { environment = "development" } } // Production Loki loki.write "prod" { endpoint = "http://loki-prod:3100/loki/api/v1/push" external_labels = { environment = "production" } } // Route based on label loki.relabel "route_logs" { rule { source_labels = ["environment"] regex = "prod" target_label = "__receiver__" replacement = "loki.write.prod" } rule { source_labels = ["environment"] regex = "dev" target_label = "__receiver__" replacement = "loki.write.dev" } forward_to = [ loki.write.prod.receiver, loki.write.dev.receiver, ] } ``` --- ## **Part 9: Best Practices** ### **1. Configuration Organization** ```alloy // 01-targets.alloy - Output destinations loki.write "default" { /* ... */ } prometheus.remote_write "default" { /* ... */ } // 02-system-metrics.alloy - Host metrics prometheus.exporter.unix "node" { /* ... */ } // 03-system-logs.alloy - Host logs local.file_match "logs" { /* ... */ } // 04-docker.alloy - Container monitoring loki.source.docker "containers" { /* ... */ } // Load all prometheus.remote_write "default" { endpoint = "http://prometheus:9090/api/v1/write" } // Import other files import.git "configs" { repository = "https://github.com/your-org/alloy-configs" path = "*.alloy" pull_frequency = "5m" } ``` ### **2. Label Strategy** ```alloy // Consistent labeling template discovery.relabel "standard_labels" { rule { target_label = "host" replacement = constants.hostname } rule { target_label = "region" replacement = "us-east-1" } rule { target_label = "team" replacement = "platform" } rule { target_label = "job" replacement = "${constants.hostname}-${component}" } } ``` ### **3. Buffering & Retry** ```alloy loki.write "default" { endpoint = "http://loki:3100/loki/api/v1/push" // Buffer when Loki is down max_backoff_period = "5s" min_backoff_period = "100ms" max_retries = 10 external_labels = { agent = "alloy" } } ``` --- ## **Part 10: Complete Production Example** ```alloy // === PRODUCTION CONFIG === // File: /etc/alloy/config.alloy // 1. OUTPUTS loki.write "production" { endpoint = "http://loki-prod:3100/loki/api/v1/push" external_labels = {cluster="prod", agent="alloy"} // Buffering max_streams = 10000 batch_wait = "1s" batch_size = 1048576 // 1MB } prometheus.remote_write "production" { endpoint = "http://prometheus-prod:9090/api/v1/write" external_labels = {cluster="prod"} // Queue config queue_config = { capacity = 2500 max_shards = 200 min_shards = 1 max_samples_per_send = 500 } } // 2. COMMON PROCESSING loki.relabel "common_labels" { rule { target_label = "host" replacement = constants.hostname } rule { target_label = "os" replacement = constants.os } rule { target_label = "agent" replacement = "alloy" } forward_to = [loki.write.production.receiver] } // 3. SYSTEM METRICS prometheus.exporter.unix {} discovery.relabel "system_metrics" { targets = prometheus.exporter.unix.default.targets rule { target_label = "job" replacement = "${constants.hostname}-system" } rule { target_label = "instance" replacement = constants.hostname } } prometheus.scrape "system" { targets = discovery.relabel.system_metrics.output.targets forward_to = [prometheus.remote_write.production.receiver] } // 4. SYSTEM LOGS local.file_match "system_logs" { path_targets = [ {__path__ = "/var/log/syslog"}, {__path__ = "/var/log/auth.log"}, {__path__ = "/var/log/kern.log"}, ] } loki.source.file "system_logs" { targets = local.file_match.system_logs.targets forward_to = [loki.relabel.common_labels.receiver] } // 5. JOURNAL loki.source.journal "journal" { path = "/var/log/journal" // Required in containers forward_to = [loki.relabel.common_labels.receiver] labels = { job = "${constants.hostname}-journal" } } // 6. DOCKER loki.source.docker "containers" { host = "unix:///var/run/docker.sock" forward_to = [loki.relabel.common_labels.receiver] labels = { job = "${constants.hostname}-docker" } } prometheus.exporter.docker {} discovery.relabel "docker_metrics" { targets = prometheus.exporter.docker.default.targets rule { target_label = "job" replacement = "${constants.hostname}-docker" } } prometheus.scrape "docker" { targets = discovery.relabel.docker_metrics.output.targets forward_to = [prometheus.remote_write.production.receiver] } ``` --- ## **Part 11: Troubleshooting Cheat Sheet** ### **Common Issues & Fixes:** 1. **"No metrics in Prometheus"** ```bash # Check Prometheus has remote write enabled ps aux | grep prometheus | grep enable-remote-write # Test connection curl -X POST http://prometheus:9090/api/v1/write ``` 2. **"No logs in Loki"** ```bash # Check Alloy web UI http://localhost:12345/graph # Check component health curl http://localhost:12345/-/healthy # View Alloy logs docker logs alloy ``` 3. **"Journal not working in container"** ```alloy // Add path to journal source loki.source.journal "journal" { path = "/var/log/journal" // ← THIS LINE } ``` 4. **"Hostname wrong in metrics"** ```yaml # In docker-compose.yml alloy: hostname: your-actual-hostname # ← Set explicitly ``` 5. **Validate config:** ```bash alloy check config.alloy alloy run --config.file=config.alloy --dry-run ``` ### **Debug Commands:** ```bash # View component graph open http://localhost:12345/graph # Check metrics Alloy generates about itself curl http://localhost:12345/metrics # Live tail logs curl -N http://localhost:12345/api/v0/logs/tail # Export config curl http://localhost:12345/api/v0/config ``` --- ## **Part 12: Migration Checklist** ### **From Old Stack → Alloy** | Old Component | Alloy Replacement | Action | |---------------|-------------------|--------| | Promtail | `loki.source.file` + `local.file_match` | Run `alloy convert` | | node_exporter | `prometheus.exporter.unix` | Remove node_exporter service | | cadvisor | `prometheus.exporter.docker` | Remove cadvisor container | | Loki Docker plugin | `loki.source.docker` | Remove plugin, keep logging=json | | Multiple config files | Single `config.alloy` | Consolidate into sections | ### **Step-by-Step Migration:** 1. **Stage 1**: Deploy Alloy alongside existing tools 2. **Stage 2**: Compare data between old/new in Grafana 3. **Stage 3**: Route 10% traffic to Alloy 4. **Stage 4**: Full cutover, decommission old tools --- ## **Quick Reference Card** ### **Essential Components:** - `local.file_match` - Find log files - `loki.source.file` - Read log files - `loki.source.journal` - Read systemd journal - `loki.source.docker` - Read container logs - `prometheus.exporter.unix` - System metrics - `prometheus.exporter.docker` - Container metrics - `loki.relabel` - Process logs - `discovery.relabel` - Process metrics - `prometheus.scrape` - Send metrics - `loki.write` - Send logs - `prometheus.remote_write` - Send metrics ### **Magic Variables:** - `constants.hostname` - System hostname - `constants.os` - Operating system - `constants.architecture` - CPU arch - `constants.version` - Alloy version ### **Web UI Endpoints:** - `:12345/` - Homepage - `:12345/graph` - Component visualization - `:12345/metrics` - Self-metrics - `:12345/-/healthy` - Health check - `:12345/api/v0/config` - Current config --- ## **Next Steps After Mastery** 1. **Add tracing**: `otelcol.*` components for OpenTelemetry 2. **Multi-cluster**: Use `import.http` for centralized config 3. **Custom components**: Build your own with Go 4. **Kubernetes**: Use Alloy Operator for dynamic discovery 5. **Alerting**: Add `prometheus.relabel` for alert rules ## **Resources** - [Official Docs](https://grafana.com/docs/alloy/latest/) - [Alloy Scenarios (Examples)](https://github.com/grafana/alloy-scenarios) - [Configuration Reference](https://grafana.com/docs/alloy/latest/reference/components/) --- **Remember**: Start simple, use the web UI (`:12345/graph`) to visualize your pipeline, and incrementally add complexity. Alloy's power is in its composability - build your monitoring like Lego!