Add tech_docs/networking/linux_networking.md

2025-06-19 05:37:55 +00:00
parent 692c82bc4d
commit 78661c6f41
1 changed files with 177 additions and 0 deletions
--- a/tech_docs/networking/linux_networking.md
+++ b/tech_docs/networking/linux_networking.md
@@ -0,0 +1,177 @@
+Here’s a **focused Linux deep dive** covering **process management, networking, and resource triage**—the 20% that delivers 80% of the value for senior roles:
+
+---
+
+# **Linux for Senior Engineers: Processes, Networking & Triage**  
+**Goal:** Master the commands and concepts needed to **analyze, optimize, and troubleshoot** Linux systems at scale.  
+
+---
+
+## **1. Process Management**  
+### **1.1 Viewing & Controlling Processes**  
+| **Command**               | **Purpose**                                  | **Key Flags**                     |  
+|---------------------------|---------------------------------------------|-----------------------------------|  
+| `ps`                      | Snapshot of running processes               | `-ef` (all), `aux` (detailed)     |  
+| `top` / `htop`            | Real-time process monitor                   | `-p PID` (filter by PID)          |  
+| `kill`                    | Terminate processes                         | `-9` (SIGKILL), `-15` (SIGTERM)   |  
+| `pkill` / `killall`       | Kill processes by name                      | `-f` (match full command)         |  
+| `nice` / `renice`         | Adjust process priority (niceness)          | `-n 19` (lowest priority)         |  
+
+**Pro Tip:**  
+- Use `strace -p PID` to **debug hanging processes** (system calls tracing).  
+
+---
+
+### **1.2 Process Resource Usage**  
+| **Command**       | **What It Shows**                          |  
+|-------------------|--------------------------------------------|  
+| `pidstat -p PID`  | CPU, memory, and I/O stats for a process   |  
+| `vmstat 1`        | System-wide CPU, memory, and I/O trends    |  
+| `iotop`           | Disk I/O by process (requires root)        |  
+
+**Critical Metrics:**  
+- **%CPU**: >80% for prolonged periods → Check for runaway processes.  
+- **%MEM**: Watch for leaks (e.g., Java apps).  
+- **IOwait** (`vmstat`): High values indicate disk bottlenecks.  
+
+---
+
+## **2. Linux Networking**  
+### **2.1 Key Networking Commands**  
+| **Command**               | **Purpose**                                  | **Example**                      |  
+|---------------------------|---------------------------------------------|----------------------------------|  
+| `ip addr` / `ifconfig`    | View interfaces and IPs                     | `ip addr show eth0`              |  
+| `ss` / `netstat`          | Socket statistics                           | `ss -tulnp` (all listening ports)|  
+| `tcpdump`                 | Packet capture                              | `tcpdump -i eth0 port 80`        |  
+| `dig` / `nslookup`        | DNS troubleshooting                         | `dig +short example.com`         |  
+| `traceroute` / `mtr`      | Network path analysis                       | `mtr google.com`                 |  
+
+**Pro Tip:**  
+- Use `nc` (netcat) for **quick port checks**:  
+  ```bash
+  nc -zv example.com 443  # Test if port 443 is open
+  ```
+
+---
+
+### **2.2 Advanced Networking**  
+- **Network Namespaces**: Isolate network stacks (used by Docker/Kubernetes).  
+  ```bash
+  ip netns list  # List namespaces
+  ```
+- **eBPF**: Trace network traffic at kernel level (e.g., `bpftrace`).  
+- **IPTables/nftables**: Firewall rules (legacy vs. modern).  
+
+---
+
+## **3. Resource Triage (Troubleshooting)**  
+### **3.1 The "Big Four" Resources**  
+1. **CPU**  
+   - **Tool:** `mpstat -P ALL 1` (per-core usage).  
+   - **Red Flag:** `%idle` < 10% → Bottleneck.  
+
+2. **Memory**  
+   - **Tools:** `free -h`, `cat /proc/meminfo`.  
+   - **Key Metrics:**  
+     - **Available** (not "free") memory.  
+     - **Swap usage** > 0 → Memory pressure.  
+
+3. **Disk I/O**  
+   - **Tools:** `iostat -xz 1`, `df -h`.  
+   - **Red Flags:**  
+     - `%util` > 70% (disk saturated).  
+     - `await` > 10ms (slow I/O).  
+
+4. **Network**  
+   - **Tools:** `sar -n DEV 1`, `ethtool eth0`.  
+   - **Red Flags:**  
+     - `RX/TX drops` > 0 → Congestion or NIC issues.  
+
+---
+
+### **3.2 Triage Workflow**  
+1. **Identify the bottleneck**:  
+   ```bash
+   uptime              # Load averages (1m, 5m, 15m)
+   dmesg -T | tail     # Kernel logs (OOMs, hardware errors)
+   ```
+2. **Drill down**:  
+   - High CPU? → `top → pidstat -p PID 1`.  
+   - High RAM? → `cat /proc/PID/status | grep VmRSS`.  
+3. **Check dependencies**:  
+   - Database slow? → `ss -t src :5432` (Postgres connections).  
+
+---
+
+## **4. Performance Optimization**  
+### **4.1 CPU**  
+- **Pin processes to cores**: `taskset -c 0,1 ./script.sh`.  
+- **Limit CPU usage**: `cpulimit -p PID -l 50` (max 50% CPU).  
+
+### **4.2 Memory**  
+- **Clear caches** (in emergencies):  
+  ```bash
+  echo 3 > /proc/sys/vm/drop_caches  # Cleans pagecache, dentries, inodes
+  ```
+- **OOM Killer Tuning**: Adjust `/proc/PID/oom_score_adj`.  
+
+### **4.3 Disk**  
+- **I/O Scheduler**: Switch to `deadline` for databases:  
+  ```bash
+  echo deadline > /sys/block/sda/queue/scheduler
+  ```
+
+---
+
+## **5. Must-Know Files & Directories**  
+| **Path**                  | **Purpose**                                  |  
+|---------------------------|---------------------------------------------|  
+| `/proc/PID/`              | Process details (limits, stats, FD usage)   |  
+| `/etc/sysctl.conf`        | Kernel tuning (e.g., `vm.swappiness`)       |  
+| `/var/log/`               | System logs (auth, kernel, apps)            |  
+| `/sys/class/net/`         | Network interface configurations            |  
+
+---
+
+## **6. Interview Questions**  
+1. **How do you find which process is listening on port 8080?**  
+   ```bash
+   ss -tulnp | grep 8080
+   ```  
+2. **What’s the difference between `kill -9` and `kill -15`?**  
+   - `-15` (SIGTERM): Graceful shutdown.  
+   - `-9` (SIGKILL): Forceful termination (last resort).  
+3. **How do you debug high CPU usage?**  
+   - `top → pidstat -p PID 1 → strace -p PID`.  
+
+---
+
+## **7. Lab Exercises**  
+1. **Simulate a CPU hog**:  
+   ```bash
+   while true; do :; done &  # Background infinite loop
+   ```  
+   Then use `pidstat` to monitor.  
+2. **Trigger OOM Killer**:  
+   ```bash
+   tail /dev/zero  # Consume all RAM
+   ```  
+   Check `dmesg` for OOM events.  
+
+---
+
+## **Cheat Sheet**  
+```bash
+# Quick system overview
+dmesg | tail          # Kernel errors
+vmstat 1              # CPU/memory/IO
+ss -tulnp             # Listening ports
+df -h                 # Disk space
+```  
+
+**Next Steps:**  
+- Learn **eBPF** for advanced tracing (`bpftrace`).  
+- Practice **container networking** (Docker, Kubernetes).  
+- Master **log aggregation** (ELK, Loki).  
+
+Need a **deep dive on kernel tuning** or **real-world outage scenarios**? Let me know! 🚀