Add tech_docs/networking/linux_networking.md
This commit is contained in:
177
tech_docs/networking/linux_networking.md
Normal file
177
tech_docs/networking/linux_networking.md
Normal file
@@ -0,0 +1,177 @@
|
|||||||
|
Here’s a **focused Linux deep dive** covering **process management, networking, and resource triage**—the 20% that delivers 80% of the value for senior roles:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# **Linux for Senior Engineers: Processes, Networking & Triage**
|
||||||
|
**Goal:** Master the commands and concepts needed to **analyze, optimize, and troubleshoot** Linux systems at scale.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **1. Process Management**
|
||||||
|
### **1.1 Viewing & Controlling Processes**
|
||||||
|
| **Command** | **Purpose** | **Key Flags** |
|
||||||
|
|---------------------------|---------------------------------------------|-----------------------------------|
|
||||||
|
| `ps` | Snapshot of running processes | `-ef` (all), `aux` (detailed) |
|
||||||
|
| `top` / `htop` | Real-time process monitor | `-p PID` (filter by PID) |
|
||||||
|
| `kill` | Terminate processes | `-9` (SIGKILL), `-15` (SIGTERM) |
|
||||||
|
| `pkill` / `killall` | Kill processes by name | `-f` (match full command) |
|
||||||
|
| `nice` / `renice` | Adjust process priority (niceness) | `-n 19` (lowest priority) |
|
||||||
|
|
||||||
|
**Pro Tip:**
|
||||||
|
- Use `strace -p PID` to **debug hanging processes** (system calls tracing).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **1.2 Process Resource Usage**
|
||||||
|
| **Command** | **What It Shows** |
|
||||||
|
|-------------------|--------------------------------------------|
|
||||||
|
| `pidstat -p PID` | CPU, memory, and I/O stats for a process |
|
||||||
|
| `vmstat 1` | System-wide CPU, memory, and I/O trends |
|
||||||
|
| `iotop` | Disk I/O by process (requires root) |
|
||||||
|
|
||||||
|
**Critical Metrics:**
|
||||||
|
- **%CPU**: >80% for prolonged periods → Check for runaway processes.
|
||||||
|
- **%MEM**: Watch for leaks (e.g., Java apps).
|
||||||
|
- **IOwait** (`vmstat`): High values indicate disk bottlenecks.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **2. Linux Networking**
|
||||||
|
### **2.1 Key Networking Commands**
|
||||||
|
| **Command** | **Purpose** | **Example** |
|
||||||
|
|---------------------------|---------------------------------------------|----------------------------------|
|
||||||
|
| `ip addr` / `ifconfig` | View interfaces and IPs | `ip addr show eth0` |
|
||||||
|
| `ss` / `netstat` | Socket statistics | `ss -tulnp` (all listening ports)|
|
||||||
|
| `tcpdump` | Packet capture | `tcpdump -i eth0 port 80` |
|
||||||
|
| `dig` / `nslookup` | DNS troubleshooting | `dig +short example.com` |
|
||||||
|
| `traceroute` / `mtr` | Network path analysis | `mtr google.com` |
|
||||||
|
|
||||||
|
**Pro Tip:**
|
||||||
|
- Use `nc` (netcat) for **quick port checks**:
|
||||||
|
```bash
|
||||||
|
nc -zv example.com 443 # Test if port 443 is open
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **2.2 Advanced Networking**
|
||||||
|
- **Network Namespaces**: Isolate network stacks (used by Docker/Kubernetes).
|
||||||
|
```bash
|
||||||
|
ip netns list # List namespaces
|
||||||
|
```
|
||||||
|
- **eBPF**: Trace network traffic at kernel level (e.g., `bpftrace`).
|
||||||
|
- **IPTables/nftables**: Firewall rules (legacy vs. modern).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **3. Resource Triage (Troubleshooting)**
|
||||||
|
### **3.1 The "Big Four" Resources**
|
||||||
|
1. **CPU**
|
||||||
|
- **Tool:** `mpstat -P ALL 1` (per-core usage).
|
||||||
|
- **Red Flag:** `%idle` < 10% → Bottleneck.
|
||||||
|
|
||||||
|
2. **Memory**
|
||||||
|
- **Tools:** `free -h`, `cat /proc/meminfo`.
|
||||||
|
- **Key Metrics:**
|
||||||
|
- **Available** (not "free") memory.
|
||||||
|
- **Swap usage** > 0 → Memory pressure.
|
||||||
|
|
||||||
|
3. **Disk I/O**
|
||||||
|
- **Tools:** `iostat -xz 1`, `df -h`.
|
||||||
|
- **Red Flags:**
|
||||||
|
- `%util` > 70% (disk saturated).
|
||||||
|
- `await` > 10ms (slow I/O).
|
||||||
|
|
||||||
|
4. **Network**
|
||||||
|
- **Tools:** `sar -n DEV 1`, `ethtool eth0`.
|
||||||
|
- **Red Flags:**
|
||||||
|
- `RX/TX drops` > 0 → Congestion or NIC issues.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **3.2 Triage Workflow**
|
||||||
|
1. **Identify the bottleneck**:
|
||||||
|
```bash
|
||||||
|
uptime # Load averages (1m, 5m, 15m)
|
||||||
|
dmesg -T | tail # Kernel logs (OOMs, hardware errors)
|
||||||
|
```
|
||||||
|
2. **Drill down**:
|
||||||
|
- High CPU? → `top → pidstat -p PID 1`.
|
||||||
|
- High RAM? → `cat /proc/PID/status | grep VmRSS`.
|
||||||
|
3. **Check dependencies**:
|
||||||
|
- Database slow? → `ss -t src :5432` (Postgres connections).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **4. Performance Optimization**
|
||||||
|
### **4.1 CPU**
|
||||||
|
- **Pin processes to cores**: `taskset -c 0,1 ./script.sh`.
|
||||||
|
- **Limit CPU usage**: `cpulimit -p PID -l 50` (max 50% CPU).
|
||||||
|
|
||||||
|
### **4.2 Memory**
|
||||||
|
- **Clear caches** (in emergencies):
|
||||||
|
```bash
|
||||||
|
echo 3 > /proc/sys/vm/drop_caches # Cleans pagecache, dentries, inodes
|
||||||
|
```
|
||||||
|
- **OOM Killer Tuning**: Adjust `/proc/PID/oom_score_adj`.
|
||||||
|
|
||||||
|
### **4.3 Disk**
|
||||||
|
- **I/O Scheduler**: Switch to `deadline` for databases:
|
||||||
|
```bash
|
||||||
|
echo deadline > /sys/block/sda/queue/scheduler
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **5. Must-Know Files & Directories**
|
||||||
|
| **Path** | **Purpose** |
|
||||||
|
|---------------------------|---------------------------------------------|
|
||||||
|
| `/proc/PID/` | Process details (limits, stats, FD usage) |
|
||||||
|
| `/etc/sysctl.conf` | Kernel tuning (e.g., `vm.swappiness`) |
|
||||||
|
| `/var/log/` | System logs (auth, kernel, apps) |
|
||||||
|
| `/sys/class/net/` | Network interface configurations |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **6. Interview Questions**
|
||||||
|
1. **How do you find which process is listening on port 8080?**
|
||||||
|
```bash
|
||||||
|
ss -tulnp | grep 8080
|
||||||
|
```
|
||||||
|
2. **What’s the difference between `kill -9` and `kill -15`?**
|
||||||
|
- `-15` (SIGTERM): Graceful shutdown.
|
||||||
|
- `-9` (SIGKILL): Forceful termination (last resort).
|
||||||
|
3. **How do you debug high CPU usage?**
|
||||||
|
- `top → pidstat -p PID 1 → strace -p PID`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **7. Lab Exercises**
|
||||||
|
1. **Simulate a CPU hog**:
|
||||||
|
```bash
|
||||||
|
while true; do :; done & # Background infinite loop
|
||||||
|
```
|
||||||
|
Then use `pidstat` to monitor.
|
||||||
|
2. **Trigger OOM Killer**:
|
||||||
|
```bash
|
||||||
|
tail /dev/zero # Consume all RAM
|
||||||
|
```
|
||||||
|
Check `dmesg` for OOM events.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **Cheat Sheet**
|
||||||
|
```bash
|
||||||
|
# Quick system overview
|
||||||
|
dmesg | tail # Kernel errors
|
||||||
|
vmstat 1 # CPU/memory/IO
|
||||||
|
ss -tulnp # Listening ports
|
||||||
|
df -h # Disk space
|
||||||
|
```
|
||||||
|
|
||||||
|
**Next Steps:**
|
||||||
|
- Learn **eBPF** for advanced tracing (`bpftrace`).
|
||||||
|
- Practice **container networking** (Docker, Kubernetes).
|
||||||
|
- Master **log aggregation** (ELK, Loki).
|
||||||
|
|
||||||
|
Need a **deep dive on kernel tuning** or **real-world outage scenarios**? Let me know! 🚀
|
||||||
Reference in New Issue
Block a user