Update tech_docs/prometheus.md

This commit is contained in:
2024-05-29 19:14:01 +00:00
parent 17391cb9c1
commit aa0500e728

View File

@@ -1,3 +1,294 @@
To add Grafana to your setup, we need to extend the `docker-compose.yml` file and configure Grafana to use Prometheus as a data source. Here are the steps:
### Step 1: Extend docker-compose.yml to Include Grafana
Add the Grafana service to your `docker-compose.yml` file:
```yaml
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alert.rules:/etc/prometheus/alert.rules
networks:
- monitoring
node_exporter:
image: prom/node-exporter:latest
container_name: node_exporter
networks:
- monitoring
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
networks:
- monitoring
volumes:
- grafana-storage:/var/lib/grafana
networks:
monitoring:
driver: bridge
volumes:
grafana-storage:
```
### Step 2: Restart Docker Services
Restart your Docker services to include Grafana:
```bash
docker-compose down
docker-compose up -d
```
### Step 3: Configure Grafana
1. **Access Grafana**:
Open your web browser and go to `http://localhost:3000` (Grafana default credentials: `admin/admin`).
2. **Add Prometheus Data Source**:
- Go to **Configuration > Data Sources > Add data source**.
- Select **Prometheus**.
- Set the URL to `http://prometheus:9090` and save.
3. **Create a Dashboard**:
- Create a new dashboard.
- Add new panels to visualize metrics from Node Exporter, such as CPU usage, memory usage, disk usage, etc.
### Example PromQL Queries for Grafana Panels
- **CPU Usage**:
```promql
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
```
- **Memory Usage**:
```promql
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
```
- **Disk Usage**:
```promql
100 - ((node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs|rootfs"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs|rootfs"}) * 100)
```
### Step 4: Verify the Setup
1. **Check Grafana Dashboard**:
Open Grafana at `http://localhost:3000` and verify that you can see the metrics from your Linux systems.
2. **Check Prometheus Targets**:
Open Prometheus at `http://localhost:9090/targets` to ensure all targets are being scraped correctly.
### Summary
By adding Grafana to your Docker Compose setup and configuring it to use Prometheus as a data source, you can create powerful dashboards to visualize metrics from your Linux systems. This provides a comprehensive monitoring solution using Prometheus and Grafana. If you have any questions or need further assistance, feel free to ask!
---
### Key Metrics and KPIs to Monitor
1. **CPU Usage**
2. **Memory Usage**
3. **Disk Usage**
4. **Network Traffic**
5. **System Load**
6. **Uptime**
7. **Temperature (if available)**
### Step-by-Step Guide to Create a Grafana Dashboard
#### Step 1: Access Grafana
Open your web browser and go to `http://localhost:3000` (Grafana default credentials: `admin/admin`).
#### Step 2: Add Prometheus Data Source
1. **Configuration > Data Sources > Add data source**
2. **Select Prometheus**
3. **Set the URL to `http://prometheus:9090` and save**
#### Step 3: Create a New Dashboard
1. **Dashboard > New Dashboard > Add a New Panel**
#### Step 4: Add Panels with PromQL Queries
Here are the important metrics and their corresponding PromQL queries:
1. **CPU Usage**
- **Panel Title:** CPU Usage (%)
- **PromQL Query:**
```promql
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
```
2. **Memory Usage**
- **Panel Title:** Memory Usage (%)
- **PromQL Query:**
```promql
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
```
3. **Disk Usage**
- **Panel Title:** Disk Usage (%)
- **PromQL Query:**
```promql
100 - ((node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs|rootfs"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs|rootfs"}) * 100)
```
4. **Network Traffic**
- **Panel Title:** Network Inbound Traffic (Bytes/s)
- **PromQL Query:**
```promql
rate(node_network_receive_bytes_total[5m])
```
- **Panel Title:** Network Outbound Traffic (Bytes/s)
- **PromQL Query:**
```promql
rate(node_network_transmit_bytes_total[5m])
```
5. **System Load**
- **Panel Title:** System Load (1m)
- **PromQL Query:**
```promql
node_load1
```
6. **Uptime**
- **Panel Title:** System Uptime (seconds)
- **PromQL Query:**
```promql
node_time_seconds - node_boot_time_seconds
```
7. **Temperature (if available)**
- **Panel Title:** CPU Temperature (°C)
- **PromQL Query:**
```promql
node_hwmon_temp_celsius
```
### Example Panel Configurations
#### CPU Usage Panel
1. **Add a new panel**.
2. **Set the title to "CPU Usage (%)"**.
3. **Enter the PromQL query**:
```promql
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
```
4. **Set visualization type to "Graph"**.
5. **Customize the visualization settings (e.g., set y-axis unit to percentage)**.
#### Memory Usage Panel
1. **Add a new panel**.
2. **Set the title to "Memory Usage (%)"**.
3. **Enter the PromQL query**:
```promql
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
```
4. **Set visualization type to "Graph"**.
5. **Customize the visualization settings (e.g., set y-axis unit to percentage)**.
#### Disk Usage Panel
1. **Add a new panel**.
2. **Set the title to "Disk Usage (%)"**.
3. **Enter the PromQL query**:
```promql
100 - ((node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs|rootfs"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs|rootfs"}) * 100)
```
4. **Set visualization type to "Graph"**.
5. **Customize the visualization settings (e.g., set y-axis unit to percentage)**.
#### Network Traffic Panels
**Inbound Traffic**
1. **Add a new panel**.
2. **Set the title to "Network Inbound Traffic (Bytes/s)"**.
3. **Enter the PromQL query**:
```promql
rate(node_network_receive_bytes_total[5m])
```
4. **Set visualization type to "Graph"**.
5. **Customize the visualization settings (e.g., set y-axis unit to bytes/sec)**.
**Outbound Traffic**
1. **Add a new panel**.
2. **Set the title to "Network Outbound Traffic (Bytes/s)"**.
3. **Enter the PromQL query**:
```promql
rate(node_network_transmit_bytes_total[5m])
```
4. **Set visualization type to "Graph"**.
5. **Customize the visualization settings (e.g., set y-axis unit to bytes/sec)**.
#### System Load Panel
1. **Add a new panel**.
2. **Set the title to "System Load (1m)"**.
3. **Enter the PromQL query**:
```promql
node_load1
```
4. **Set visualization type to "Graph"**.
5. **Customize the visualization settings (e.g., set y-axis unit to none)**.
#### Uptime Panel
1. **Add a new panel**.
2. **Set the title to "System Uptime (seconds)"**.
3. **Enter the PromQL query**:
```promql
node_time_seconds - node_boot_time_seconds
```
4. **Set visualization type to "Stat"**.
5. **Customize the visualization settings (e.g., set y-axis unit to seconds)**.
#### Temperature Panel (if available)
1. **Add a new panel**.
2. **Set the title to "CPU Temperature (°C)"**.
3. **Enter the PromQL query**:
```promql
node_hwmon_temp_celsius
```
4. **Set visualization type to "Graph"**.
5. **Customize the visualization settings (e.g., set y-axis unit to degrees Celsius)**.
### Summary
By setting up these panels in Grafana, you'll have a comprehensive dashboard displaying key metrics and KPIs for your Linux systems. This will provide valuable insights into the performance and health of your infrastructure.
If you have specific metrics or additional customizations you'd like to include, feel free to ask!
---
### Directory Structure
We'll organize the directories under `/volume1/docker/prometheus` as follows: