diff --git a/tech_docs/networking/linux_networking.md b/tech_docs/networking/linux_networking.md new file mode 100644 index 0000000..7167512 --- /dev/null +++ b/tech_docs/networking/linux_networking.md @@ -0,0 +1,177 @@ +Here’s a **focused Linux deep dive** covering **process management, networking, and resource triage**—the 20% that delivers 80% of the value for senior roles: + +--- + +# **Linux for Senior Engineers: Processes, Networking & Triage** +**Goal:** Master the commands and concepts needed to **analyze, optimize, and troubleshoot** Linux systems at scale. + +--- + +## **1. Process Management** +### **1.1 Viewing & Controlling Processes** +| **Command** | **Purpose** | **Key Flags** | +|---------------------------|---------------------------------------------|-----------------------------------| +| `ps` | Snapshot of running processes | `-ef` (all), `aux` (detailed) | +| `top` / `htop` | Real-time process monitor | `-p PID` (filter by PID) | +| `kill` | Terminate processes | `-9` (SIGKILL), `-15` (SIGTERM) | +| `pkill` / `killall` | Kill processes by name | `-f` (match full command) | +| `nice` / `renice` | Adjust process priority (niceness) | `-n 19` (lowest priority) | + +**Pro Tip:** +- Use `strace -p PID` to **debug hanging processes** (system calls tracing). + +--- + +### **1.2 Process Resource Usage** +| **Command** | **What It Shows** | +|-------------------|--------------------------------------------| +| `pidstat -p PID` | CPU, memory, and I/O stats for a process | +| `vmstat 1` | System-wide CPU, memory, and I/O trends | +| `iotop` | Disk I/O by process (requires root) | + +**Critical Metrics:** +- **%CPU**: >80% for prolonged periods → Check for runaway processes. +- **%MEM**: Watch for leaks (e.g., Java apps). +- **IOwait** (`vmstat`): High values indicate disk bottlenecks. + +--- + +## **2. Linux Networking** +### **2.1 Key Networking Commands** +| **Command** | **Purpose** | **Example** | +|---------------------------|---------------------------------------------|----------------------------------| +| `ip addr` / `ifconfig` | View interfaces and IPs | `ip addr show eth0` | +| `ss` / `netstat` | Socket statistics | `ss -tulnp` (all listening ports)| +| `tcpdump` | Packet capture | `tcpdump -i eth0 port 80` | +| `dig` / `nslookup` | DNS troubleshooting | `dig +short example.com` | +| `traceroute` / `mtr` | Network path analysis | `mtr google.com` | + +**Pro Tip:** +- Use `nc` (netcat) for **quick port checks**: + ```bash + nc -zv example.com 443 # Test if port 443 is open + ``` + +--- + +### **2.2 Advanced Networking** +- **Network Namespaces**: Isolate network stacks (used by Docker/Kubernetes). + ```bash + ip netns list # List namespaces + ``` +- **eBPF**: Trace network traffic at kernel level (e.g., `bpftrace`). +- **IPTables/nftables**: Firewall rules (legacy vs. modern). + +--- + +## **3. Resource Triage (Troubleshooting)** +### **3.1 The "Big Four" Resources** +1. **CPU** + - **Tool:** `mpstat -P ALL 1` (per-core usage). + - **Red Flag:** `%idle` < 10% → Bottleneck. + +2. **Memory** + - **Tools:** `free -h`, `cat /proc/meminfo`. + - **Key Metrics:** + - **Available** (not "free") memory. + - **Swap usage** > 0 → Memory pressure. + +3. **Disk I/O** + - **Tools:** `iostat -xz 1`, `df -h`. + - **Red Flags:** + - `%util` > 70% (disk saturated). + - `await` > 10ms (slow I/O). + +4. **Network** + - **Tools:** `sar -n DEV 1`, `ethtool eth0`. + - **Red Flags:** + - `RX/TX drops` > 0 → Congestion or NIC issues. + +--- + +### **3.2 Triage Workflow** +1. **Identify the bottleneck**: + ```bash + uptime # Load averages (1m, 5m, 15m) + dmesg -T | tail # Kernel logs (OOMs, hardware errors) + ``` +2. **Drill down**: + - High CPU? → `top → pidstat -p PID 1`. + - High RAM? → `cat /proc/PID/status | grep VmRSS`. +3. **Check dependencies**: + - Database slow? → `ss -t src :5432` (Postgres connections). + +--- + +## **4. Performance Optimization** +### **4.1 CPU** +- **Pin processes to cores**: `taskset -c 0,1 ./script.sh`. +- **Limit CPU usage**: `cpulimit -p PID -l 50` (max 50% CPU). + +### **4.2 Memory** +- **Clear caches** (in emergencies): + ```bash + echo 3 > /proc/sys/vm/drop_caches # Cleans pagecache, dentries, inodes + ``` +- **OOM Killer Tuning**: Adjust `/proc/PID/oom_score_adj`. + +### **4.3 Disk** +- **I/O Scheduler**: Switch to `deadline` for databases: + ```bash + echo deadline > /sys/block/sda/queue/scheduler + ``` + +--- + +## **5. Must-Know Files & Directories** +| **Path** | **Purpose** | +|---------------------------|---------------------------------------------| +| `/proc/PID/` | Process details (limits, stats, FD usage) | +| `/etc/sysctl.conf` | Kernel tuning (e.g., `vm.swappiness`) | +| `/var/log/` | System logs (auth, kernel, apps) | +| `/sys/class/net/` | Network interface configurations | + +--- + +## **6. Interview Questions** +1. **How do you find which process is listening on port 8080?** + ```bash + ss -tulnp | grep 8080 + ``` +2. **What’s the difference between `kill -9` and `kill -15`?** + - `-15` (SIGTERM): Graceful shutdown. + - `-9` (SIGKILL): Forceful termination (last resort). +3. **How do you debug high CPU usage?** + - `top → pidstat -p PID 1 → strace -p PID`. + +--- + +## **7. Lab Exercises** +1. **Simulate a CPU hog**: + ```bash + while true; do :; done & # Background infinite loop + ``` + Then use `pidstat` to monitor. +2. **Trigger OOM Killer**: + ```bash + tail /dev/zero # Consume all RAM + ``` + Check `dmesg` for OOM events. + +--- + +## **Cheat Sheet** +```bash +# Quick system overview +dmesg | tail # Kernel errors +vmstat 1 # CPU/memory/IO +ss -tulnp # Listening ports +df -h # Disk space +``` + +**Next Steps:** +- Learn **eBPF** for advanced tracing (`bpftrace`). +- Practice **container networking** (Docker, Kubernetes). +- Master **log aggregation** (ELK, Loki). + +Need a **deep dive on kernel tuning** or **real-world outage scenarios**? Let me know! 🚀 \ No newline at end of file