Add tech_docs/grafana_alloy.md

This commit is contained in:
2026-02-07 23:19:27 +00:00
parent 06c4a74f64
commit aa932f0bde

653
tech_docs/grafana_alloy.md Normal file
View File

@@ -0,0 +1,653 @@
# **Grafana Alloy: Zero to Hero Guide**
## **Part 1: Foundation & Core Concepts**
### **What is Grafana Alloy?**
Alloy is a **unified telemetry agent** that collects, processes, and forwards:
- **Logs** (to Loki)
- **Metrics** (to Prometheus)
- **Traces** (to Tempo) - not covered in video but supported
**Why use it?**
- Replaces: Promtail, node_exporter, cadvisor, Loki Docker plugin
- Single configuration file for everything
- Process/filter data BEFORE storage
- Component-based architecture (Lego blocks for monitoring)
---
## **Part 2: Installation (5 Minutes)**
### **Option A: Docker (Recommended for Testing)**
```yaml
# docker-compose.yml
version: '3.8'
services:
alloy:
image: grafana/alloy:latest
container_name: alloy
hostname: your-server-name # ← CRITICAL: Set your actual hostname
command:
- "--server.http.listen-addr=0.0.0.0:12345"
- "--storage.path=/var/lib/alloy/data"
- "--config.file=/etc/alloy/config.alloy"
ports:
- "12345:12345" # Web UI
volumes:
- ./config.alloy:/etc/alloy/config.alloy
- alloy-data:/var/lib/alloy/data
- /var/log:/var/log:ro # For host logs
- /var/run/docker.sock:/var/run/docker.sock:ro # For Docker
- /proc:/proc:ro # For metrics
- /sys:/sys:ro # For metrics
restart: unless-stopped
volumes:
alloy-data:
```
### **Option B: Binary (Production)**
```bash
# Download latest
wget https://github.com/grafana/alloy/releases/latest/download/alloy-linux-amd64
chmod +x alloy-linux-amd64
sudo mv alloy-linux-amd64 /usr/local/bin/alloy
# Create systemd service
sudo nano /etc/systemd/system/alloy.service
```
**Service file:**
```ini
[Unit]
Description=Grafana Alloy
After=network.target
[Service]
Type=simple
User=alloy
ExecStart=/usr/local/bin/alloy run --config.file=/etc/alloy/config.alloy
Restart=always
[Install]
WantedBy=multi-user.target
```
---
## **Part 3: Your First Configuration (Level 1)**
### **Basic Structure - Understanding Components**
```alloy
// config.alloy
// 1. TARGETS (Where data goes)
loki.write "default" {
endpoint = "http://loki:3100/loki/api/v1/push"
}
prometheus.remote_write "default" {
endpoint = "http://prometheus:9090/api/v1/write"
}
// 2. SOURCES (Where data comes from)
// We'll add these in next steps
```
**Key Concept**: Alloy connects `sources``processors``targets`
---
## **Part 4: Collect Host Logs (Level 2)**
### **Replace Promtail - Simple Log Collection**
```alloy
// Add to config.alloy after targets
// File discovery
local.file_match "syslog" {
path_targets = [{
__address__ = "localhost",
__path__ = "/var/log/syslog",
}]
}
// Log reader
loki.source.file "syslog" {
targets = local.file_match.syslog.targets
forward_to = [loki.write.default.receiver]
}
// Convert existing Promtail config:
// alloy convert --source-format=promtail promtail.yaml config.alloy
```
**Test it:**
```bash
docker-compose up -d
curl http://localhost:12345/health # Should return healthy
```
---
## **Part 5: Collect Host Metrics (Level 3)**
### **Replace node_exporter - System Metrics**
```alloy
// Prometheus config needs this flag:
// --web.enable-remote-write-receiver
prometheus.exporter.unix "node_metrics" {
// Automatically collects CPU, memory, disk, network
}
discovery.relabel "node_metrics" {
targets = prometheus.exporter.unix.node_metrics.targets
rule {
source_labels = ["__address__"]
target_label = "instance"
replacement = constants.hostname // Uses system hostname
}
rule {
target_label = "job"
replacement = "${constants.hostname}-metrics" // Dynamic job name
}
}
prometheus.scrape "node_metrics" {
targets = discovery.relabel.node_metrics.output.targets
forward_to = [prometheus.remote_write.default.receiver]
}
```
**View metrics in Grafana:**
1. Import dashboard ID `1860` (Node Exporter Full)
2. Filter by `job=your-hostname-metrics`
---
## **Part 6: Add Processing (Level 4)**
### **Relabeling - Add Custom Labels**
```alloy
// For logs
loki.relabel "add_os_label" {
forward_to = [loki.write.default.receiver]
rule {
target_label = "os"
replacement = constants.os // auto-populates "linux"
}
rule {
target_label = "environment"
replacement = "production"
}
}
// Update log source to use relabeler
loki.source.file "syslog" {
targets = local.file_match.syslog.targets
forward_to = [loki.relabel.add_os_label.receiver] // Changed!
}
```
### **Filtering - Drop Unwanted Logs**
```alloy
loki.relabel "filter_logs" {
forward_to = [loki.write.default.receiver]
// Drop DEBUG logs
rule {
source_labels = ["__line__"]
regex = "(?i)debug"
action = "drop"
}
// Keep only ERROR and WARN
rule {
source_labels = ["__line__"]
regex = "(?i)error|warn|fail"
action = "keep"
}
}
```
---
## **Part 7: Docker Monitoring (Level 5)**
### **Collect Container Logs & Metrics**
```alloy
// Docker logs (no plugin needed!)
loki.source.docker "container_logs" {
host = "unix:///var/run/docker.sock"
forward_to = [loki.write.default.receiver]
labels = {
container_name = "{{.Name}}",
image_name = "{{.ImageName}}",
}
}
// Docker metrics
prometheus.exporter.docker "container_metrics" {
host = "unix:///var/run/docker.sock"
}
discovery.relabel "docker_metrics" {
targets = prometheus.exporter.docker.container_metrics.targets
rule {
target_label = "job"
replacement = "${constants.hostname}-docker"
}
}
prometheus.scrape "docker_metrics" {
targets = discovery.relabel.docker_metrics.output.targets
forward_to = [prometheus.remote_write.default.receiver]
}
```
**No Docker Compose changes needed!** All containers are automatically discovered.
---
## **Part 8: Advanced Scenarios (Level 6)**
### **Multiple Log Sources**
```alloy
// System journal
loki.source.journal "journal" {
forward_to = [loki.write.default.receiver]
// Required in containers
path = "/var/log/journal"
labels = {
job = "${constants.hostname}-journal"
}
}
// Application logs
local.file_match "app_logs" {
path_targets = [{
__address__ = "localhost",
__path__ = "/app/*.log",
app = "myapp", // Custom label
}]
}
loki.source.file "app_logs" {
targets = local.file_match.app_logs.targets
forward_to = [loki.write.default.receiver]
}
```
### **Multiple Output Destinations**
```alloy
// Development Loki
loki.write "dev" {
endpoint = "http://loki-dev:3100/loki/api/v1/push"
external_labels = {
environment = "development"
}
}
// Production Loki
loki.write "prod" {
endpoint = "http://loki-prod:3100/loki/api/v1/push"
external_labels = {
environment = "production"
}
}
// Route based on label
loki.relabel "route_logs" {
rule {
source_labels = ["environment"]
regex = "prod"
target_label = "__receiver__"
replacement = "loki.write.prod"
}
rule {
source_labels = ["environment"]
regex = "dev"
target_label = "__receiver__"
replacement = "loki.write.dev"
}
forward_to = [
loki.write.prod.receiver,
loki.write.dev.receiver,
]
}
```
---
## **Part 9: Best Practices**
### **1. Configuration Organization**
```alloy
// 01-targets.alloy - Output destinations
loki.write "default" { /* ... */ }
prometheus.remote_write "default" { /* ... */ }
// 02-system-metrics.alloy - Host metrics
prometheus.exporter.unix "node" { /* ... */ }
// 03-system-logs.alloy - Host logs
local.file_match "logs" { /* ... */ }
// 04-docker.alloy - Container monitoring
loki.source.docker "containers" { /* ... */ }
// Load all
prometheus.remote_write "default" {
endpoint = "http://prometheus:9090/api/v1/write"
}
// Import other files
import.git "configs" {
repository = "https://github.com/your-org/alloy-configs"
path = "*.alloy"
pull_frequency = "5m"
}
```
### **2. Label Strategy**
```alloy
// Consistent labeling template
discovery.relabel "standard_labels" {
rule {
target_label = "host"
replacement = constants.hostname
}
rule {
target_label = "region"
replacement = "us-east-1"
}
rule {
target_label = "team"
replacement = "platform"
}
rule {
target_label = "job"
replacement = "${constants.hostname}-${component}"
}
}
```
### **3. Buffering & Retry**
```alloy
loki.write "default" {
endpoint = "http://loki:3100/loki/api/v1/push"
// Buffer when Loki is down
max_backoff_period = "5s"
min_backoff_period = "100ms"
max_retries = 10
external_labels = {
agent = "alloy"
}
}
```
---
## **Part 10: Complete Production Example**
```alloy
// === PRODUCTION CONFIG ===
// File: /etc/alloy/config.alloy
// 1. OUTPUTS
loki.write "production" {
endpoint = "http://loki-prod:3100/loki/api/v1/push"
external_labels = {cluster="prod", agent="alloy"}
// Buffering
max_streams = 10000
batch_wait = "1s"
batch_size = 1048576 // 1MB
}
prometheus.remote_write "production" {
endpoint = "http://prometheus-prod:9090/api/v1/write"
external_labels = {cluster="prod"}
// Queue config
queue_config = {
capacity = 2500
max_shards = 200
min_shards = 1
max_samples_per_send = 500
}
}
// 2. COMMON PROCESSING
loki.relabel "common_labels" {
rule {
target_label = "host"
replacement = constants.hostname
}
rule {
target_label = "os"
replacement = constants.os
}
rule {
target_label = "agent"
replacement = "alloy"
}
forward_to = [loki.write.production.receiver]
}
// 3. SYSTEM METRICS
prometheus.exporter.unix {}
discovery.relabel "system_metrics" {
targets = prometheus.exporter.unix.default.targets
rule {
target_label = "job"
replacement = "${constants.hostname}-system"
}
rule {
target_label = "instance"
replacement = constants.hostname
}
}
prometheus.scrape "system" {
targets = discovery.relabel.system_metrics.output.targets
forward_to = [prometheus.remote_write.production.receiver]
}
// 4. SYSTEM LOGS
local.file_match "system_logs" {
path_targets = [
{__path__ = "/var/log/syslog"},
{__path__ = "/var/log/auth.log"},
{__path__ = "/var/log/kern.log"},
]
}
loki.source.file "system_logs" {
targets = local.file_match.system_logs.targets
forward_to = [loki.relabel.common_labels.receiver]
}
// 5. JOURNAL
loki.source.journal "journal" {
path = "/var/log/journal" // Required in containers
forward_to = [loki.relabel.common_labels.receiver]
labels = {
job = "${constants.hostname}-journal"
}
}
// 6. DOCKER
loki.source.docker "containers" {
host = "unix:///var/run/docker.sock"
forward_to = [loki.relabel.common_labels.receiver]
labels = {
job = "${constants.hostname}-docker"
}
}
prometheus.exporter.docker {}
discovery.relabel "docker_metrics" {
targets = prometheus.exporter.docker.default.targets
rule {
target_label = "job"
replacement = "${constants.hostname}-docker"
}
}
prometheus.scrape "docker" {
targets = discovery.relabel.docker_metrics.output.targets
forward_to = [prometheus.remote_write.production.receiver]
}
```
---
## **Part 11: Troubleshooting Cheat Sheet**
### **Common Issues & Fixes:**
1. **"No metrics in Prometheus"**
```bash
# Check Prometheus has remote write enabled
ps aux | grep prometheus | grep enable-remote-write
# Test connection
curl -X POST http://prometheus:9090/api/v1/write
```
2. **"No logs in Loki"**
```bash
# Check Alloy web UI
http://localhost:12345/graph
# Check component health
curl http://localhost:12345/-/healthy
# View Alloy logs
docker logs alloy
```
3. **"Journal not working in container"**
```alloy
// Add path to journal source
loki.source.journal "journal" {
path = "/var/log/journal" // ← THIS LINE
}
```
4. **"Hostname wrong in metrics"**
```yaml
# In docker-compose.yml
alloy:
hostname: your-actual-hostname # ← Set explicitly
```
5. **Validate config:**
```bash
alloy check config.alloy
alloy run --config.file=config.alloy --dry-run
```
### **Debug Commands:**
```bash
# View component graph
open http://localhost:12345/graph
# Check metrics Alloy generates about itself
curl http://localhost:12345/metrics
# Live tail logs
curl -N http://localhost:12345/api/v0/logs/tail
# Export config
curl http://localhost:12345/api/v0/config
```
---
## **Part 12: Migration Checklist**
### **From Old Stack → Alloy**
| Old Component | Alloy Replacement | Action |
|---------------|-------------------|--------|
| Promtail | `loki.source.file` + `local.file_match` | Run `alloy convert` |
| node_exporter | `prometheus.exporter.unix` | Remove node_exporter service |
| cadvisor | `prometheus.exporter.docker` | Remove cadvisor container |
| Loki Docker plugin | `loki.source.docker` | Remove plugin, keep logging=json |
| Multiple config files | Single `config.alloy` | Consolidate into sections |
### **Step-by-Step Migration:**
1. **Stage 1**: Deploy Alloy alongside existing tools
2. **Stage 2**: Compare data between old/new in Grafana
3. **Stage 3**: Route 10% traffic to Alloy
4. **Stage 4**: Full cutover, decommission old tools
---
## **Quick Reference Card**
### **Essential Components:**
- `local.file_match` - Find log files
- `loki.source.file` - Read log files
- `loki.source.journal` - Read systemd journal
- `loki.source.docker` - Read container logs
- `prometheus.exporter.unix` - System metrics
- `prometheus.exporter.docker` - Container metrics
- `loki.relabel` - Process logs
- `discovery.relabel` - Process metrics
- `prometheus.scrape` - Send metrics
- `loki.write` - Send logs
- `prometheus.remote_write` - Send metrics
### **Magic Variables:**
- `constants.hostname` - System hostname
- `constants.os` - Operating system
- `constants.architecture` - CPU arch
- `constants.version` - Alloy version
### **Web UI Endpoints:**
- `:12345/` - Homepage
- `:12345/graph` - Component visualization
- `:12345/metrics` - Self-metrics
- `:12345/-/healthy` - Health check
- `:12345/api/v0/config` - Current config
---
## **Next Steps After Mastery**
1. **Add tracing**: `otelcol.*` components for OpenTelemetry
2. **Multi-cluster**: Use `import.http` for centralized config
3. **Custom components**: Build your own with Go
4. **Kubernetes**: Use Alloy Operator for dynamic discovery
5. **Alerting**: Add `prometheus.relabel` for alert rules
## **Resources**
- [Official Docs](https://grafana.com/docs/alloy/latest/)
- [Alloy Scenarios (Examples)](https://github.com/grafana/alloy-scenarios)
- [Configuration Reference](https://grafana.com/docs/alloy/latest/reference/components/)
---
**Remember**: Start simple, use the web UI (`:12345/graph`) to visualize your pipeline, and incrementally add complexity. Alloy's power is in its composability - build your monitoring like Lego!