653 lines
15 KiB
Markdown
653 lines
15 KiB
Markdown
# **Grafana Alloy: Zero to Hero Guide**
|
|
|
|
## **Part 1: Foundation & Core Concepts**
|
|
|
|
### **What is Grafana Alloy?**
|
|
Alloy is a **unified telemetry agent** that collects, processes, and forwards:
|
|
- **Logs** (to Loki)
|
|
- **Metrics** (to Prometheus)
|
|
- **Traces** (to Tempo) - not covered in video but supported
|
|
|
|
**Why use it?**
|
|
- Replaces: Promtail, node_exporter, cadvisor, Loki Docker plugin
|
|
- Single configuration file for everything
|
|
- Process/filter data BEFORE storage
|
|
- Component-based architecture (Lego blocks for monitoring)
|
|
|
|
---
|
|
|
|
## **Part 2: Installation (5 Minutes)**
|
|
|
|
### **Option A: Docker (Recommended for Testing)**
|
|
```yaml
|
|
# docker-compose.yml
|
|
version: '3.8'
|
|
services:
|
|
alloy:
|
|
image: grafana/alloy:latest
|
|
container_name: alloy
|
|
hostname: your-server-name # ← CRITICAL: Set your actual hostname
|
|
command:
|
|
- "--server.http.listen-addr=0.0.0.0:12345"
|
|
- "--storage.path=/var/lib/alloy/data"
|
|
- "--config.file=/etc/alloy/config.alloy"
|
|
ports:
|
|
- "12345:12345" # Web UI
|
|
volumes:
|
|
- ./config.alloy:/etc/alloy/config.alloy
|
|
- alloy-data:/var/lib/alloy/data
|
|
- /var/log:/var/log:ro # For host logs
|
|
- /var/run/docker.sock:/var/run/docker.sock:ro # For Docker
|
|
- /proc:/proc:ro # For metrics
|
|
- /sys:/sys:ro # For metrics
|
|
restart: unless-stopped
|
|
|
|
volumes:
|
|
alloy-data:
|
|
```
|
|
|
|
### **Option B: Binary (Production)**
|
|
```bash
|
|
# Download latest
|
|
wget https://github.com/grafana/alloy/releases/latest/download/alloy-linux-amd64
|
|
chmod +x alloy-linux-amd64
|
|
sudo mv alloy-linux-amd64 /usr/local/bin/alloy
|
|
|
|
# Create systemd service
|
|
sudo nano /etc/systemd/system/alloy.service
|
|
```
|
|
|
|
**Service file:**
|
|
```ini
|
|
[Unit]
|
|
Description=Grafana Alloy
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=simple
|
|
User=alloy
|
|
ExecStart=/usr/local/bin/alloy run --config.file=/etc/alloy/config.alloy
|
|
Restart=always
|
|
|
|
[Install]
|
|
WantedBy=multi-user.target
|
|
```
|
|
|
|
---
|
|
|
|
## **Part 3: Your First Configuration (Level 1)**
|
|
|
|
### **Basic Structure - Understanding Components**
|
|
```alloy
|
|
// config.alloy
|
|
// 1. TARGETS (Where data goes)
|
|
loki.write "default" {
|
|
endpoint = "http://loki:3100/loki/api/v1/push"
|
|
}
|
|
|
|
prometheus.remote_write "default" {
|
|
endpoint = "http://prometheus:9090/api/v1/write"
|
|
}
|
|
|
|
// 2. SOURCES (Where data comes from)
|
|
// We'll add these in next steps
|
|
```
|
|
|
|
**Key Concept**: Alloy connects `sources` → `processors` → `targets`
|
|
|
|
---
|
|
|
|
## **Part 4: Collect Host Logs (Level 2)**
|
|
|
|
### **Replace Promtail - Simple Log Collection**
|
|
```alloy
|
|
// Add to config.alloy after targets
|
|
|
|
// File discovery
|
|
local.file_match "syslog" {
|
|
path_targets = [{
|
|
__address__ = "localhost",
|
|
__path__ = "/var/log/syslog",
|
|
}]
|
|
}
|
|
|
|
// Log reader
|
|
loki.source.file "syslog" {
|
|
targets = local.file_match.syslog.targets
|
|
forward_to = [loki.write.default.receiver]
|
|
}
|
|
|
|
// Convert existing Promtail config:
|
|
// alloy convert --source-format=promtail promtail.yaml config.alloy
|
|
```
|
|
|
|
**Test it:**
|
|
```bash
|
|
docker-compose up -d
|
|
curl http://localhost:12345/health # Should return healthy
|
|
```
|
|
|
|
---
|
|
|
|
## **Part 5: Collect Host Metrics (Level 3)**
|
|
|
|
### **Replace node_exporter - System Metrics**
|
|
```alloy
|
|
// Prometheus config needs this flag:
|
|
// --web.enable-remote-write-receiver
|
|
|
|
prometheus.exporter.unix "node_metrics" {
|
|
// Automatically collects CPU, memory, disk, network
|
|
}
|
|
|
|
discovery.relabel "node_metrics" {
|
|
targets = prometheus.exporter.unix.node_metrics.targets
|
|
|
|
rule {
|
|
source_labels = ["__address__"]
|
|
target_label = "instance"
|
|
replacement = constants.hostname // Uses system hostname
|
|
}
|
|
|
|
rule {
|
|
target_label = "job"
|
|
replacement = "${constants.hostname}-metrics" // Dynamic job name
|
|
}
|
|
}
|
|
|
|
prometheus.scrape "node_metrics" {
|
|
targets = discovery.relabel.node_metrics.output.targets
|
|
forward_to = [prometheus.remote_write.default.receiver]
|
|
}
|
|
```
|
|
|
|
**View metrics in Grafana:**
|
|
1. Import dashboard ID `1860` (Node Exporter Full)
|
|
2. Filter by `job=your-hostname-metrics`
|
|
|
|
---
|
|
|
|
## **Part 6: Add Processing (Level 4)**
|
|
|
|
### **Relabeling - Add Custom Labels**
|
|
```alloy
|
|
// For logs
|
|
loki.relabel "add_os_label" {
|
|
forward_to = [loki.write.default.receiver]
|
|
|
|
rule {
|
|
target_label = "os"
|
|
replacement = constants.os // auto-populates "linux"
|
|
}
|
|
|
|
rule {
|
|
target_label = "environment"
|
|
replacement = "production"
|
|
}
|
|
}
|
|
|
|
// Update log source to use relabeler
|
|
loki.source.file "syslog" {
|
|
targets = local.file_match.syslog.targets
|
|
forward_to = [loki.relabel.add_os_label.receiver] // Changed!
|
|
}
|
|
```
|
|
|
|
### **Filtering - Drop Unwanted Logs**
|
|
```alloy
|
|
loki.relabel "filter_logs" {
|
|
forward_to = [loki.write.default.receiver]
|
|
|
|
// Drop DEBUG logs
|
|
rule {
|
|
source_labels = ["__line__"]
|
|
regex = "(?i)debug"
|
|
action = "drop"
|
|
}
|
|
|
|
// Keep only ERROR and WARN
|
|
rule {
|
|
source_labels = ["__line__"]
|
|
regex = "(?i)error|warn|fail"
|
|
action = "keep"
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## **Part 7: Docker Monitoring (Level 5)**
|
|
|
|
### **Collect Container Logs & Metrics**
|
|
```alloy
|
|
// Docker logs (no plugin needed!)
|
|
loki.source.docker "container_logs" {
|
|
host = "unix:///var/run/docker.sock"
|
|
forward_to = [loki.write.default.receiver]
|
|
|
|
labels = {
|
|
container_name = "{{.Name}}",
|
|
image_name = "{{.ImageName}}",
|
|
}
|
|
}
|
|
|
|
// Docker metrics
|
|
prometheus.exporter.docker "container_metrics" {
|
|
host = "unix:///var/run/docker.sock"
|
|
}
|
|
|
|
discovery.relabel "docker_metrics" {
|
|
targets = prometheus.exporter.docker.container_metrics.targets
|
|
|
|
rule {
|
|
target_label = "job"
|
|
replacement = "${constants.hostname}-docker"
|
|
}
|
|
}
|
|
|
|
prometheus.scrape "docker_metrics" {
|
|
targets = discovery.relabel.docker_metrics.output.targets
|
|
forward_to = [prometheus.remote_write.default.receiver]
|
|
}
|
|
```
|
|
|
|
**No Docker Compose changes needed!** All containers are automatically discovered.
|
|
|
|
---
|
|
|
|
## **Part 8: Advanced Scenarios (Level 6)**
|
|
|
|
### **Multiple Log Sources**
|
|
```alloy
|
|
// System journal
|
|
loki.source.journal "journal" {
|
|
forward_to = [loki.write.default.receiver]
|
|
|
|
// Required in containers
|
|
path = "/var/log/journal"
|
|
|
|
labels = {
|
|
job = "${constants.hostname}-journal"
|
|
}
|
|
}
|
|
|
|
// Application logs
|
|
local.file_match "app_logs" {
|
|
path_targets = [{
|
|
__address__ = "localhost",
|
|
__path__ = "/app/*.log",
|
|
app = "myapp", // Custom label
|
|
}]
|
|
}
|
|
|
|
loki.source.file "app_logs" {
|
|
targets = local.file_match.app_logs.targets
|
|
forward_to = [loki.write.default.receiver]
|
|
}
|
|
```
|
|
|
|
### **Multiple Output Destinations**
|
|
```alloy
|
|
// Development Loki
|
|
loki.write "dev" {
|
|
endpoint = "http://loki-dev:3100/loki/api/v1/push"
|
|
|
|
external_labels = {
|
|
environment = "development"
|
|
}
|
|
}
|
|
|
|
// Production Loki
|
|
loki.write "prod" {
|
|
endpoint = "http://loki-prod:3100/loki/api/v1/push"
|
|
|
|
external_labels = {
|
|
environment = "production"
|
|
}
|
|
}
|
|
|
|
// Route based on label
|
|
loki.relabel "route_logs" {
|
|
rule {
|
|
source_labels = ["environment"]
|
|
regex = "prod"
|
|
target_label = "__receiver__"
|
|
replacement = "loki.write.prod"
|
|
}
|
|
|
|
rule {
|
|
source_labels = ["environment"]
|
|
regex = "dev"
|
|
target_label = "__receiver__"
|
|
replacement = "loki.write.dev"
|
|
}
|
|
|
|
forward_to = [
|
|
loki.write.prod.receiver,
|
|
loki.write.dev.receiver,
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## **Part 9: Best Practices**
|
|
|
|
### **1. Configuration Organization**
|
|
```alloy
|
|
// 01-targets.alloy - Output destinations
|
|
loki.write "default" { /* ... */ }
|
|
prometheus.remote_write "default" { /* ... */ }
|
|
|
|
// 02-system-metrics.alloy - Host metrics
|
|
prometheus.exporter.unix "node" { /* ... */ }
|
|
|
|
// 03-system-logs.alloy - Host logs
|
|
local.file_match "logs" { /* ... */ }
|
|
|
|
// 04-docker.alloy - Container monitoring
|
|
loki.source.docker "containers" { /* ... */ }
|
|
|
|
// Load all
|
|
prometheus.remote_write "default" {
|
|
endpoint = "http://prometheus:9090/api/v1/write"
|
|
}
|
|
|
|
// Import other files
|
|
import.git "configs" {
|
|
repository = "https://github.com/your-org/alloy-configs"
|
|
path = "*.alloy"
|
|
pull_frequency = "5m"
|
|
}
|
|
```
|
|
|
|
### **2. Label Strategy**
|
|
```alloy
|
|
// Consistent labeling template
|
|
discovery.relabel "standard_labels" {
|
|
rule {
|
|
target_label = "host"
|
|
replacement = constants.hostname
|
|
}
|
|
rule {
|
|
target_label = "region"
|
|
replacement = "us-east-1"
|
|
}
|
|
rule {
|
|
target_label = "team"
|
|
replacement = "platform"
|
|
}
|
|
rule {
|
|
target_label = "job"
|
|
replacement = "${constants.hostname}-${component}"
|
|
}
|
|
}
|
|
```
|
|
|
|
### **3. Buffering & Retry**
|
|
```alloy
|
|
loki.write "default" {
|
|
endpoint = "http://loki:3100/loki/api/v1/push"
|
|
|
|
// Buffer when Loki is down
|
|
max_backoff_period = "5s"
|
|
min_backoff_period = "100ms"
|
|
max_retries = 10
|
|
|
|
external_labels = {
|
|
agent = "alloy"
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## **Part 10: Complete Production Example**
|
|
|
|
```alloy
|
|
// === PRODUCTION CONFIG ===
|
|
// File: /etc/alloy/config.alloy
|
|
|
|
// 1. OUTPUTS
|
|
loki.write "production" {
|
|
endpoint = "http://loki-prod:3100/loki/api/v1/push"
|
|
external_labels = {cluster="prod", agent="alloy"}
|
|
|
|
// Buffering
|
|
max_streams = 10000
|
|
batch_wait = "1s"
|
|
batch_size = 1048576 // 1MB
|
|
}
|
|
|
|
prometheus.remote_write "production" {
|
|
endpoint = "http://prometheus-prod:9090/api/v1/write"
|
|
external_labels = {cluster="prod"}
|
|
|
|
// Queue config
|
|
queue_config = {
|
|
capacity = 2500
|
|
max_shards = 200
|
|
min_shards = 1
|
|
max_samples_per_send = 500
|
|
}
|
|
}
|
|
|
|
// 2. COMMON PROCESSING
|
|
loki.relabel "common_labels" {
|
|
rule {
|
|
target_label = "host"
|
|
replacement = constants.hostname
|
|
}
|
|
rule {
|
|
target_label = "os"
|
|
replacement = constants.os
|
|
}
|
|
rule {
|
|
target_label = "agent"
|
|
replacement = "alloy"
|
|
}
|
|
forward_to = [loki.write.production.receiver]
|
|
}
|
|
|
|
// 3. SYSTEM METRICS
|
|
prometheus.exporter.unix {}
|
|
|
|
discovery.relabel "system_metrics" {
|
|
targets = prometheus.exporter.unix.default.targets
|
|
|
|
rule {
|
|
target_label = "job"
|
|
replacement = "${constants.hostname}-system"
|
|
}
|
|
rule {
|
|
target_label = "instance"
|
|
replacement = constants.hostname
|
|
}
|
|
}
|
|
|
|
prometheus.scrape "system" {
|
|
targets = discovery.relabel.system_metrics.output.targets
|
|
forward_to = [prometheus.remote_write.production.receiver]
|
|
}
|
|
|
|
// 4. SYSTEM LOGS
|
|
local.file_match "system_logs" {
|
|
path_targets = [
|
|
{__path__ = "/var/log/syslog"},
|
|
{__path__ = "/var/log/auth.log"},
|
|
{__path__ = "/var/log/kern.log"},
|
|
]
|
|
}
|
|
|
|
loki.source.file "system_logs" {
|
|
targets = local.file_match.system_logs.targets
|
|
forward_to = [loki.relabel.common_labels.receiver]
|
|
}
|
|
|
|
// 5. JOURNAL
|
|
loki.source.journal "journal" {
|
|
path = "/var/log/journal" // Required in containers
|
|
forward_to = [loki.relabel.common_labels.receiver]
|
|
|
|
labels = {
|
|
job = "${constants.hostname}-journal"
|
|
}
|
|
}
|
|
|
|
// 6. DOCKER
|
|
loki.source.docker "containers" {
|
|
host = "unix:///var/run/docker.sock"
|
|
forward_to = [loki.relabel.common_labels.receiver]
|
|
|
|
labels = {
|
|
job = "${constants.hostname}-docker"
|
|
}
|
|
}
|
|
|
|
prometheus.exporter.docker {}
|
|
|
|
discovery.relabel "docker_metrics" {
|
|
targets = prometheus.exporter.docker.default.targets
|
|
|
|
rule {
|
|
target_label = "job"
|
|
replacement = "${constants.hostname}-docker"
|
|
}
|
|
}
|
|
|
|
prometheus.scrape "docker" {
|
|
targets = discovery.relabel.docker_metrics.output.targets
|
|
forward_to = [prometheus.remote_write.production.receiver]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## **Part 11: Troubleshooting Cheat Sheet**
|
|
|
|
### **Common Issues & Fixes:**
|
|
|
|
1. **"No metrics in Prometheus"**
|
|
```bash
|
|
# Check Prometheus has remote write enabled
|
|
ps aux | grep prometheus | grep enable-remote-write
|
|
|
|
# Test connection
|
|
curl -X POST http://prometheus:9090/api/v1/write
|
|
```
|
|
|
|
2. **"No logs in Loki"**
|
|
```bash
|
|
# Check Alloy web UI
|
|
http://localhost:12345/graph
|
|
|
|
# Check component health
|
|
curl http://localhost:12345/-/healthy
|
|
|
|
# View Alloy logs
|
|
docker logs alloy
|
|
```
|
|
|
|
3. **"Journal not working in container"**
|
|
```alloy
|
|
// Add path to journal source
|
|
loki.source.journal "journal" {
|
|
path = "/var/log/journal" // ← THIS LINE
|
|
}
|
|
```
|
|
|
|
4. **"Hostname wrong in metrics"**
|
|
```yaml
|
|
# In docker-compose.yml
|
|
alloy:
|
|
hostname: your-actual-hostname # ← Set explicitly
|
|
```
|
|
|
|
5. **Validate config:**
|
|
```bash
|
|
alloy check config.alloy
|
|
alloy run --config.file=config.alloy --dry-run
|
|
```
|
|
|
|
### **Debug Commands:**
|
|
```bash
|
|
# View component graph
|
|
open http://localhost:12345/graph
|
|
|
|
# Check metrics Alloy generates about itself
|
|
curl http://localhost:12345/metrics
|
|
|
|
# Live tail logs
|
|
curl -N http://localhost:12345/api/v0/logs/tail
|
|
|
|
# Export config
|
|
curl http://localhost:12345/api/v0/config
|
|
```
|
|
|
|
---
|
|
|
|
## **Part 12: Migration Checklist**
|
|
|
|
### **From Old Stack → Alloy**
|
|
|
|
| Old Component | Alloy Replacement | Action |
|
|
|---------------|-------------------|--------|
|
|
| Promtail | `loki.source.file` + `local.file_match` | Run `alloy convert` |
|
|
| node_exporter | `prometheus.exporter.unix` | Remove node_exporter service |
|
|
| cadvisor | `prometheus.exporter.docker` | Remove cadvisor container |
|
|
| Loki Docker plugin | `loki.source.docker` | Remove plugin, keep logging=json |
|
|
| Multiple config files | Single `config.alloy` | Consolidate into sections |
|
|
|
|
### **Step-by-Step Migration:**
|
|
1. **Stage 1**: Deploy Alloy alongside existing tools
|
|
2. **Stage 2**: Compare data between old/new in Grafana
|
|
3. **Stage 3**: Route 10% traffic to Alloy
|
|
4. **Stage 4**: Full cutover, decommission old tools
|
|
|
|
---
|
|
|
|
## **Quick Reference Card**
|
|
|
|
### **Essential Components:**
|
|
- `local.file_match` - Find log files
|
|
- `loki.source.file` - Read log files
|
|
- `loki.source.journal` - Read systemd journal
|
|
- `loki.source.docker` - Read container logs
|
|
- `prometheus.exporter.unix` - System metrics
|
|
- `prometheus.exporter.docker` - Container metrics
|
|
- `loki.relabel` - Process logs
|
|
- `discovery.relabel` - Process metrics
|
|
- `prometheus.scrape` - Send metrics
|
|
- `loki.write` - Send logs
|
|
- `prometheus.remote_write` - Send metrics
|
|
|
|
### **Magic Variables:**
|
|
- `constants.hostname` - System hostname
|
|
- `constants.os` - Operating system
|
|
- `constants.architecture` - CPU arch
|
|
- `constants.version` - Alloy version
|
|
|
|
### **Web UI Endpoints:**
|
|
- `:12345/` - Homepage
|
|
- `:12345/graph` - Component visualization
|
|
- `:12345/metrics` - Self-metrics
|
|
- `:12345/-/healthy` - Health check
|
|
- `:12345/api/v0/config` - Current config
|
|
|
|
---
|
|
|
|
## **Next Steps After Mastery**
|
|
|
|
1. **Add tracing**: `otelcol.*` components for OpenTelemetry
|
|
2. **Multi-cluster**: Use `import.http` for centralized config
|
|
3. **Custom components**: Build your own with Go
|
|
4. **Kubernetes**: Use Alloy Operator for dynamic discovery
|
|
5. **Alerting**: Add `prometheus.relabel` for alert rules
|
|
## **Resources**
|
|
- [Official Docs](https://grafana.com/docs/alloy/latest/)
|
|
- [Alloy Scenarios (Examples)](https://github.com/grafana/alloy-scenarios)
|
|
- [Configuration Reference](https://grafana.com/docs/alloy/latest/reference/components/)
|
|
|
|
---
|
|
|
|
**Remember**: Start simple, use the web UI (`:12345/graph`) to visualize your pipeline, and incrementally add complexity. Alloy's power is in its composability - build your monitoring like Lego! |