Add tech_docs/grafana_alloy.md
This commit is contained in:
653
tech_docs/grafana_alloy.md
Normal file
653
tech_docs/grafana_alloy.md
Normal file
@@ -0,0 +1,653 @@
|
||||
# **Grafana Alloy: Zero to Hero Guide**
|
||||
|
||||
## **Part 1: Foundation & Core Concepts**
|
||||
|
||||
### **What is Grafana Alloy?**
|
||||
Alloy is a **unified telemetry agent** that collects, processes, and forwards:
|
||||
- **Logs** (to Loki)
|
||||
- **Metrics** (to Prometheus)
|
||||
- **Traces** (to Tempo) - not covered in video but supported
|
||||
|
||||
**Why use it?**
|
||||
- Replaces: Promtail, node_exporter, cadvisor, Loki Docker plugin
|
||||
- Single configuration file for everything
|
||||
- Process/filter data BEFORE storage
|
||||
- Component-based architecture (Lego blocks for monitoring)
|
||||
|
||||
---
|
||||
|
||||
## **Part 2: Installation (5 Minutes)**
|
||||
|
||||
### **Option A: Docker (Recommended for Testing)**
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
version: '3.8'
|
||||
services:
|
||||
alloy:
|
||||
image: grafana/alloy:latest
|
||||
container_name: alloy
|
||||
hostname: your-server-name # ← CRITICAL: Set your actual hostname
|
||||
command:
|
||||
- "--server.http.listen-addr=0.0.0.0:12345"
|
||||
- "--storage.path=/var/lib/alloy/data"
|
||||
- "--config.file=/etc/alloy/config.alloy"
|
||||
ports:
|
||||
- "12345:12345" # Web UI
|
||||
volumes:
|
||||
- ./config.alloy:/etc/alloy/config.alloy
|
||||
- alloy-data:/var/lib/alloy/data
|
||||
- /var/log:/var/log:ro # For host logs
|
||||
- /var/run/docker.sock:/var/run/docker.sock:ro # For Docker
|
||||
- /proc:/proc:ro # For metrics
|
||||
- /sys:/sys:ro # For metrics
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
alloy-data:
|
||||
```
|
||||
|
||||
### **Option B: Binary (Production)**
|
||||
```bash
|
||||
# Download latest
|
||||
wget https://github.com/grafana/alloy/releases/latest/download/alloy-linux-amd64
|
||||
chmod +x alloy-linux-amd64
|
||||
sudo mv alloy-linux-amd64 /usr/local/bin/alloy
|
||||
|
||||
# Create systemd service
|
||||
sudo nano /etc/systemd/system/alloy.service
|
||||
```
|
||||
|
||||
**Service file:**
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Grafana Alloy
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=alloy
|
||||
ExecStart=/usr/local/bin/alloy run --config.file=/etc/alloy/config.alloy
|
||||
Restart=always
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Part 3: Your First Configuration (Level 1)**
|
||||
|
||||
### **Basic Structure - Understanding Components**
|
||||
```alloy
|
||||
// config.alloy
|
||||
// 1. TARGETS (Where data goes)
|
||||
loki.write "default" {
|
||||
endpoint = "http://loki:3100/loki/api/v1/push"
|
||||
}
|
||||
|
||||
prometheus.remote_write "default" {
|
||||
endpoint = "http://prometheus:9090/api/v1/write"
|
||||
}
|
||||
|
||||
// 2. SOURCES (Where data comes from)
|
||||
// We'll add these in next steps
|
||||
```
|
||||
|
||||
**Key Concept**: Alloy connects `sources` → `processors` → `targets`
|
||||
|
||||
---
|
||||
|
||||
## **Part 4: Collect Host Logs (Level 2)**
|
||||
|
||||
### **Replace Promtail - Simple Log Collection**
|
||||
```alloy
|
||||
// Add to config.alloy after targets
|
||||
|
||||
// File discovery
|
||||
local.file_match "syslog" {
|
||||
path_targets = [{
|
||||
__address__ = "localhost",
|
||||
__path__ = "/var/log/syslog",
|
||||
}]
|
||||
}
|
||||
|
||||
// Log reader
|
||||
loki.source.file "syslog" {
|
||||
targets = local.file_match.syslog.targets
|
||||
forward_to = [loki.write.default.receiver]
|
||||
}
|
||||
|
||||
// Convert existing Promtail config:
|
||||
// alloy convert --source-format=promtail promtail.yaml config.alloy
|
||||
```
|
||||
|
||||
**Test it:**
|
||||
```bash
|
||||
docker-compose up -d
|
||||
curl http://localhost:12345/health # Should return healthy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Part 5: Collect Host Metrics (Level 3)**
|
||||
|
||||
### **Replace node_exporter - System Metrics**
|
||||
```alloy
|
||||
// Prometheus config needs this flag:
|
||||
// --web.enable-remote-write-receiver
|
||||
|
||||
prometheus.exporter.unix "node_metrics" {
|
||||
// Automatically collects CPU, memory, disk, network
|
||||
}
|
||||
|
||||
discovery.relabel "node_metrics" {
|
||||
targets = prometheus.exporter.unix.node_metrics.targets
|
||||
|
||||
rule {
|
||||
source_labels = ["__address__"]
|
||||
target_label = "instance"
|
||||
replacement = constants.hostname // Uses system hostname
|
||||
}
|
||||
|
||||
rule {
|
||||
target_label = "job"
|
||||
replacement = "${constants.hostname}-metrics" // Dynamic job name
|
||||
}
|
||||
}
|
||||
|
||||
prometheus.scrape "node_metrics" {
|
||||
targets = discovery.relabel.node_metrics.output.targets
|
||||
forward_to = [prometheus.remote_write.default.receiver]
|
||||
}
|
||||
```
|
||||
|
||||
**View metrics in Grafana:**
|
||||
1. Import dashboard ID `1860` (Node Exporter Full)
|
||||
2. Filter by `job=your-hostname-metrics`
|
||||
|
||||
---
|
||||
|
||||
## **Part 6: Add Processing (Level 4)**
|
||||
|
||||
### **Relabeling - Add Custom Labels**
|
||||
```alloy
|
||||
// For logs
|
||||
loki.relabel "add_os_label" {
|
||||
forward_to = [loki.write.default.receiver]
|
||||
|
||||
rule {
|
||||
target_label = "os"
|
||||
replacement = constants.os // auto-populates "linux"
|
||||
}
|
||||
|
||||
rule {
|
||||
target_label = "environment"
|
||||
replacement = "production"
|
||||
}
|
||||
}
|
||||
|
||||
// Update log source to use relabeler
|
||||
loki.source.file "syslog" {
|
||||
targets = local.file_match.syslog.targets
|
||||
forward_to = [loki.relabel.add_os_label.receiver] // Changed!
|
||||
}
|
||||
```
|
||||
|
||||
### **Filtering - Drop Unwanted Logs**
|
||||
```alloy
|
||||
loki.relabel "filter_logs" {
|
||||
forward_to = [loki.write.default.receiver]
|
||||
|
||||
// Drop DEBUG logs
|
||||
rule {
|
||||
source_labels = ["__line__"]
|
||||
regex = "(?i)debug"
|
||||
action = "drop"
|
||||
}
|
||||
|
||||
// Keep only ERROR and WARN
|
||||
rule {
|
||||
source_labels = ["__line__"]
|
||||
regex = "(?i)error|warn|fail"
|
||||
action = "keep"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Part 7: Docker Monitoring (Level 5)**
|
||||
|
||||
### **Collect Container Logs & Metrics**
|
||||
```alloy
|
||||
// Docker logs (no plugin needed!)
|
||||
loki.source.docker "container_logs" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
forward_to = [loki.write.default.receiver]
|
||||
|
||||
labels = {
|
||||
container_name = "{{.Name}}",
|
||||
image_name = "{{.ImageName}}",
|
||||
}
|
||||
}
|
||||
|
||||
// Docker metrics
|
||||
prometheus.exporter.docker "container_metrics" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
}
|
||||
|
||||
discovery.relabel "docker_metrics" {
|
||||
targets = prometheus.exporter.docker.container_metrics.targets
|
||||
|
||||
rule {
|
||||
target_label = "job"
|
||||
replacement = "${constants.hostname}-docker"
|
||||
}
|
||||
}
|
||||
|
||||
prometheus.scrape "docker_metrics" {
|
||||
targets = discovery.relabel.docker_metrics.output.targets
|
||||
forward_to = [prometheus.remote_write.default.receiver]
|
||||
}
|
||||
```
|
||||
|
||||
**No Docker Compose changes needed!** All containers are automatically discovered.
|
||||
|
||||
---
|
||||
|
||||
## **Part 8: Advanced Scenarios (Level 6)**
|
||||
|
||||
### **Multiple Log Sources**
|
||||
```alloy
|
||||
// System journal
|
||||
loki.source.journal "journal" {
|
||||
forward_to = [loki.write.default.receiver]
|
||||
|
||||
// Required in containers
|
||||
path = "/var/log/journal"
|
||||
|
||||
labels = {
|
||||
job = "${constants.hostname}-journal"
|
||||
}
|
||||
}
|
||||
|
||||
// Application logs
|
||||
local.file_match "app_logs" {
|
||||
path_targets = [{
|
||||
__address__ = "localhost",
|
||||
__path__ = "/app/*.log",
|
||||
app = "myapp", // Custom label
|
||||
}]
|
||||
}
|
||||
|
||||
loki.source.file "app_logs" {
|
||||
targets = local.file_match.app_logs.targets
|
||||
forward_to = [loki.write.default.receiver]
|
||||
}
|
||||
```
|
||||
|
||||
### **Multiple Output Destinations**
|
||||
```alloy
|
||||
// Development Loki
|
||||
loki.write "dev" {
|
||||
endpoint = "http://loki-dev:3100/loki/api/v1/push"
|
||||
|
||||
external_labels = {
|
||||
environment = "development"
|
||||
}
|
||||
}
|
||||
|
||||
// Production Loki
|
||||
loki.write "prod" {
|
||||
endpoint = "http://loki-prod:3100/loki/api/v1/push"
|
||||
|
||||
external_labels = {
|
||||
environment = "production"
|
||||
}
|
||||
}
|
||||
|
||||
// Route based on label
|
||||
loki.relabel "route_logs" {
|
||||
rule {
|
||||
source_labels = ["environment"]
|
||||
regex = "prod"
|
||||
target_label = "__receiver__"
|
||||
replacement = "loki.write.prod"
|
||||
}
|
||||
|
||||
rule {
|
||||
source_labels = ["environment"]
|
||||
regex = "dev"
|
||||
target_label = "__receiver__"
|
||||
replacement = "loki.write.dev"
|
||||
}
|
||||
|
||||
forward_to = [
|
||||
loki.write.prod.receiver,
|
||||
loki.write.dev.receiver,
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Part 9: Best Practices**
|
||||
|
||||
### **1. Configuration Organization**
|
||||
```alloy
|
||||
// 01-targets.alloy - Output destinations
|
||||
loki.write "default" { /* ... */ }
|
||||
prometheus.remote_write "default" { /* ... */ }
|
||||
|
||||
// 02-system-metrics.alloy - Host metrics
|
||||
prometheus.exporter.unix "node" { /* ... */ }
|
||||
|
||||
// 03-system-logs.alloy - Host logs
|
||||
local.file_match "logs" { /* ... */ }
|
||||
|
||||
// 04-docker.alloy - Container monitoring
|
||||
loki.source.docker "containers" { /* ... */ }
|
||||
|
||||
// Load all
|
||||
prometheus.remote_write "default" {
|
||||
endpoint = "http://prometheus:9090/api/v1/write"
|
||||
}
|
||||
|
||||
// Import other files
|
||||
import.git "configs" {
|
||||
repository = "https://github.com/your-org/alloy-configs"
|
||||
path = "*.alloy"
|
||||
pull_frequency = "5m"
|
||||
}
|
||||
```
|
||||
|
||||
### **2. Label Strategy**
|
||||
```alloy
|
||||
// Consistent labeling template
|
||||
discovery.relabel "standard_labels" {
|
||||
rule {
|
||||
target_label = "host"
|
||||
replacement = constants.hostname
|
||||
}
|
||||
rule {
|
||||
target_label = "region"
|
||||
replacement = "us-east-1"
|
||||
}
|
||||
rule {
|
||||
target_label = "team"
|
||||
replacement = "platform"
|
||||
}
|
||||
rule {
|
||||
target_label = "job"
|
||||
replacement = "${constants.hostname}-${component}"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### **3. Buffering & Retry**
|
||||
```alloy
|
||||
loki.write "default" {
|
||||
endpoint = "http://loki:3100/loki/api/v1/push"
|
||||
|
||||
// Buffer when Loki is down
|
||||
max_backoff_period = "5s"
|
||||
min_backoff_period = "100ms"
|
||||
max_retries = 10
|
||||
|
||||
external_labels = {
|
||||
agent = "alloy"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Part 10: Complete Production Example**
|
||||
|
||||
```alloy
|
||||
// === PRODUCTION CONFIG ===
|
||||
// File: /etc/alloy/config.alloy
|
||||
|
||||
// 1. OUTPUTS
|
||||
loki.write "production" {
|
||||
endpoint = "http://loki-prod:3100/loki/api/v1/push"
|
||||
external_labels = {cluster="prod", agent="alloy"}
|
||||
|
||||
// Buffering
|
||||
max_streams = 10000
|
||||
batch_wait = "1s"
|
||||
batch_size = 1048576 // 1MB
|
||||
}
|
||||
|
||||
prometheus.remote_write "production" {
|
||||
endpoint = "http://prometheus-prod:9090/api/v1/write"
|
||||
external_labels = {cluster="prod"}
|
||||
|
||||
// Queue config
|
||||
queue_config = {
|
||||
capacity = 2500
|
||||
max_shards = 200
|
||||
min_shards = 1
|
||||
max_samples_per_send = 500
|
||||
}
|
||||
}
|
||||
|
||||
// 2. COMMON PROCESSING
|
||||
loki.relabel "common_labels" {
|
||||
rule {
|
||||
target_label = "host"
|
||||
replacement = constants.hostname
|
||||
}
|
||||
rule {
|
||||
target_label = "os"
|
||||
replacement = constants.os
|
||||
}
|
||||
rule {
|
||||
target_label = "agent"
|
||||
replacement = "alloy"
|
||||
}
|
||||
forward_to = [loki.write.production.receiver]
|
||||
}
|
||||
|
||||
// 3. SYSTEM METRICS
|
||||
prometheus.exporter.unix {}
|
||||
|
||||
discovery.relabel "system_metrics" {
|
||||
targets = prometheus.exporter.unix.default.targets
|
||||
|
||||
rule {
|
||||
target_label = "job"
|
||||
replacement = "${constants.hostname}-system"
|
||||
}
|
||||
rule {
|
||||
target_label = "instance"
|
||||
replacement = constants.hostname
|
||||
}
|
||||
}
|
||||
|
||||
prometheus.scrape "system" {
|
||||
targets = discovery.relabel.system_metrics.output.targets
|
||||
forward_to = [prometheus.remote_write.production.receiver]
|
||||
}
|
||||
|
||||
// 4. SYSTEM LOGS
|
||||
local.file_match "system_logs" {
|
||||
path_targets = [
|
||||
{__path__ = "/var/log/syslog"},
|
||||
{__path__ = "/var/log/auth.log"},
|
||||
{__path__ = "/var/log/kern.log"},
|
||||
]
|
||||
}
|
||||
|
||||
loki.source.file "system_logs" {
|
||||
targets = local.file_match.system_logs.targets
|
||||
forward_to = [loki.relabel.common_labels.receiver]
|
||||
}
|
||||
|
||||
// 5. JOURNAL
|
||||
loki.source.journal "journal" {
|
||||
path = "/var/log/journal" // Required in containers
|
||||
forward_to = [loki.relabel.common_labels.receiver]
|
||||
|
||||
labels = {
|
||||
job = "${constants.hostname}-journal"
|
||||
}
|
||||
}
|
||||
|
||||
// 6. DOCKER
|
||||
loki.source.docker "containers" {
|
||||
host = "unix:///var/run/docker.sock"
|
||||
forward_to = [loki.relabel.common_labels.receiver]
|
||||
|
||||
labels = {
|
||||
job = "${constants.hostname}-docker"
|
||||
}
|
||||
}
|
||||
|
||||
prometheus.exporter.docker {}
|
||||
|
||||
discovery.relabel "docker_metrics" {
|
||||
targets = prometheus.exporter.docker.default.targets
|
||||
|
||||
rule {
|
||||
target_label = "job"
|
||||
replacement = "${constants.hostname}-docker"
|
||||
}
|
||||
}
|
||||
|
||||
prometheus.scrape "docker" {
|
||||
targets = discovery.relabel.docker_metrics.output.targets
|
||||
forward_to = [prometheus.remote_write.production.receiver]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Part 11: Troubleshooting Cheat Sheet**
|
||||
|
||||
### **Common Issues & Fixes:**
|
||||
|
||||
1. **"No metrics in Prometheus"**
|
||||
```bash
|
||||
# Check Prometheus has remote write enabled
|
||||
ps aux | grep prometheus | grep enable-remote-write
|
||||
|
||||
# Test connection
|
||||
curl -X POST http://prometheus:9090/api/v1/write
|
||||
```
|
||||
|
||||
2. **"No logs in Loki"**
|
||||
```bash
|
||||
# Check Alloy web UI
|
||||
http://localhost:12345/graph
|
||||
|
||||
# Check component health
|
||||
curl http://localhost:12345/-/healthy
|
||||
|
||||
# View Alloy logs
|
||||
docker logs alloy
|
||||
```
|
||||
|
||||
3. **"Journal not working in container"**
|
||||
```alloy
|
||||
// Add path to journal source
|
||||
loki.source.journal "journal" {
|
||||
path = "/var/log/journal" // ← THIS LINE
|
||||
}
|
||||
```
|
||||
|
||||
4. **"Hostname wrong in metrics"**
|
||||
```yaml
|
||||
# In docker-compose.yml
|
||||
alloy:
|
||||
hostname: your-actual-hostname # ← Set explicitly
|
||||
```
|
||||
|
||||
5. **Validate config:**
|
||||
```bash
|
||||
alloy check config.alloy
|
||||
alloy run --config.file=config.alloy --dry-run
|
||||
```
|
||||
|
||||
### **Debug Commands:**
|
||||
```bash
|
||||
# View component graph
|
||||
open http://localhost:12345/graph
|
||||
|
||||
# Check metrics Alloy generates about itself
|
||||
curl http://localhost:12345/metrics
|
||||
|
||||
# Live tail logs
|
||||
curl -N http://localhost:12345/api/v0/logs/tail
|
||||
|
||||
# Export config
|
||||
curl http://localhost:12345/api/v0/config
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Part 12: Migration Checklist**
|
||||
|
||||
### **From Old Stack → Alloy**
|
||||
|
||||
| Old Component | Alloy Replacement | Action |
|
||||
|---------------|-------------------|--------|
|
||||
| Promtail | `loki.source.file` + `local.file_match` | Run `alloy convert` |
|
||||
| node_exporter | `prometheus.exporter.unix` | Remove node_exporter service |
|
||||
| cadvisor | `prometheus.exporter.docker` | Remove cadvisor container |
|
||||
| Loki Docker plugin | `loki.source.docker` | Remove plugin, keep logging=json |
|
||||
| Multiple config files | Single `config.alloy` | Consolidate into sections |
|
||||
|
||||
### **Step-by-Step Migration:**
|
||||
1. **Stage 1**: Deploy Alloy alongside existing tools
|
||||
2. **Stage 2**: Compare data between old/new in Grafana
|
||||
3. **Stage 3**: Route 10% traffic to Alloy
|
||||
4. **Stage 4**: Full cutover, decommission old tools
|
||||
|
||||
---
|
||||
|
||||
## **Quick Reference Card**
|
||||
|
||||
### **Essential Components:**
|
||||
- `local.file_match` - Find log files
|
||||
- `loki.source.file` - Read log files
|
||||
- `loki.source.journal` - Read systemd journal
|
||||
- `loki.source.docker` - Read container logs
|
||||
- `prometheus.exporter.unix` - System metrics
|
||||
- `prometheus.exporter.docker` - Container metrics
|
||||
- `loki.relabel` - Process logs
|
||||
- `discovery.relabel` - Process metrics
|
||||
- `prometheus.scrape` - Send metrics
|
||||
- `loki.write` - Send logs
|
||||
- `prometheus.remote_write` - Send metrics
|
||||
|
||||
### **Magic Variables:**
|
||||
- `constants.hostname` - System hostname
|
||||
- `constants.os` - Operating system
|
||||
- `constants.architecture` - CPU arch
|
||||
- `constants.version` - Alloy version
|
||||
|
||||
### **Web UI Endpoints:**
|
||||
- `:12345/` - Homepage
|
||||
- `:12345/graph` - Component visualization
|
||||
- `:12345/metrics` - Self-metrics
|
||||
- `:12345/-/healthy` - Health check
|
||||
- `:12345/api/v0/config` - Current config
|
||||
|
||||
---
|
||||
|
||||
## **Next Steps After Mastery**
|
||||
|
||||
1. **Add tracing**: `otelcol.*` components for OpenTelemetry
|
||||
2. **Multi-cluster**: Use `import.http` for centralized config
|
||||
3. **Custom components**: Build your own with Go
|
||||
4. **Kubernetes**: Use Alloy Operator for dynamic discovery
|
||||
5. **Alerting**: Add `prometheus.relabel` for alert rules
|
||||
## **Resources**
|
||||
- [Official Docs](https://grafana.com/docs/alloy/latest/)
|
||||
- [Alloy Scenarios (Examples)](https://github.com/grafana/alloy-scenarios)
|
||||
- [Configuration Reference](https://grafana.com/docs/alloy/latest/reference/components/)
|
||||
|
||||
---
|
||||
|
||||
**Remember**: Start simple, use the web UI (`:12345/graph`) to visualize your pipeline, and incrementally add complexity. Alloy's power is in its composability - build your monitoring like Lego!
|
||||
Reference in New Issue
Block a user