Add tech_docs/docker_swarm.md

2025-08-03 23:49:08 -05:00
parent 4f62342372
commit c51a768d79
1 changed files with 260 additions and 0 deletions
--- a/tech_docs/docker_swarm.md
+++ b/tech_docs/docker_swarm.md
@@ -0,0 +1,260 @@
+DS-920+ 20 GB RAM + SSD-Cache: **Production-grade Docker-Swarm Lab Design**
+────────────────────────────────────────────────────────────────────────────
+
+High-level goal  
+• 6 VMs (5 Swarm nodes + 1 gateway)  
+• All-in-one on the NAS today, but architected so that **any** node can be migrated to bare metal later.  
+• Observability stack is **first-class** (Prometheus, Loki, Grafana).  
+• After bake-in period we **down-size RAM** per VM and rely on SSD-cache + ballooning.
+
+------------------------------------------------
+1.  Physical resource envelope
+------------------------------------------------
+CPU      : 4 cores / 8 threads (J4125)  
+RAM      : 20 GB (16 GB upgrade + 4 GB stock)  
+Storage  : 4×HDD in SHR-1 + 2×NVMe read/write SSD cache (RAID-1)  
+Network  : 1×1 GbE (bond later if you add a USB-NIC)
+
+Hard limits  
+• Max 4 vCPUs per VM to leave headroom for DSM.  
+• Plan for **≤ 18 GB VM RAM total** so DSM + containers never swap to HDD.
+
+------------------------------------------------
+2.  VM map (initial “generous” sizing)
+------------------------------------------------
+| VM      | vCPU | RAM (GB) | Disk | Role / Notes |
+|---------|------|----------|------|--------------|
+| d12-gw  | 1    | 1        | 8 GB | Router, DNS, DHCP, jump box |
+| d12-m1  | 2    | 4        | 16 GB| Swarm mgr-1 + Prometheus |
+| d12-m2  | 2    | 4        | 16 GB| Swarm mgr-2 + Loki |
+| d12-m3  | 2    | 4        | 16 GB| Swarm mgr-3 + Grafana |
+| d12-w1  | 2    | 3        | 32 GB| Swarm worker-1 |
+| d12-w2  | 2    | 3        | 32 GB| Swarm worker-2 |
+TOTAL     11   19 GB   ≈ 120 GB thin-provisioned
+
+Disk layout on NAS  
+• All VMs on **SSD-cache-backed** volume (QoS = high).  
+• Enable **“SSD cache advisor”** → pin VM disks (random I/O) into cache.
+
+------------------------------------------------
+3.  Network topology
+------------------------------------------------
+Virtual switches  
+• **vs-lan** : 192.168.1.0/24 (eth0 on every VM) – upstream & mgmt.  
+• **vs-swarm** : 10.10.10.0/24 (eth1 on every VM) – overlay & control-plane.  
+
+Firewall rules on d12-gw  
+• Forward/NAT only if you need Internet from vs-swarm.  
+• SSH jump via d12-gw (port 2222 → internal 10.10.10.x:22).  
+
+MTU tuning  
+• vs-swarm MTU 1550 → Docker overlay MTU 1450 (leaves 100 bytes for VXLAN).
+
+------------------------------------------------
+4.  Storage layer
+------------------------------------------------
+Inside Swarm  
+• Local-path-provisioner on each worker (fast SSD-cache) for stateless pods.  
+• NFS share on d12-gw (`/srv/nfs`) exported to 10.10.10.0/24 for shared volumes (logs, Prometheus TSDB cold tier).  
+
+Backup policy  
+• DSM Snapshot Replication on VM folders nightly.  
+• Off-site push via Hyper-Backup to cloud bucket.
+
+------------------------------------------------
+5.  Observability stack (first-class)
+------------------------------------------------
+Namespace: `monitoring`
+
+| Service | Placement | Resource limits | Notes |
+|---------|-----------|-----------------|-------|
+| Prometheus | mgr-1 | 1 vCPU, 2 GB RAM | 15-day retention, 10 GiB PVC |
+| Loki | mgr-2 | 1 vCPU, 1 GB RAM | 7-day retention, 5 GiB PVC |
+| Grafana | mgr-3 | 1 vCPU, 1 GB RAM | persistent `grafana.db` on NFS |
+| node-exporter | global (all 5) | 0.1 vCPU, 64 MB | host metrics |
+| cadvisor | global (all 5) | 0.2 vCPU, 128 MB | container metrics |
+| promtail | global (all 5) | 0.1 vCPU, 64 MB | forwards to Loki |
+
+Deploy via single compose file (`observability-stack.yml`) using `docker stack deploy`.
+
+------------------------------------------------
+6.  Swarm service placement rules
+------------------------------------------------
+```yaml
+# managers only run control-plane containers
+docker node update --availability drain d12-m{1,2,3}
+
+constraints:
+  - node.labels.role==worker   # user workloads
+  - node.labels.role==monitor  # monitoring (mgr-*)
+```
+
+Label nodes:
+```bash
+docker node update --label-add role=worker d12-w{1,2}
+docker node update --label-add role=monitor d12-m{1,2,3}
+```
+
+------------------------------------------------
+7.  Post-bake-in “rightsizing” schedule
+------------------------------------------------
+Monitor **Grafana → “VM Memory Utilization”** for 2 weeks.
+
+Typical safe cuts  
+| VM      | New RAM | Justification |
+|---------|---------|---------------|
+| d12-gw  | 512 MB  | Static routes + dnsmasq only |
+| d12-m{1,2,3} | 2 GB each | 1 GB OS + 1 GB Prometheus/Loki/Grafana |
+| d12-w{1,2} | 2 GB each | 1 GB OS + 1 GB workload burst |
+TOTAL after resize ≈ **10.5 GB** (leaves ~8 GB for DSM & SSD-cache buffers).
+
+Use **virtio-balloon** so DSM can reclaim unused RAM dynamically.
+
+------------------------------------------------
+8.  Security & hardening checklist
+------------------------------------------------
+✓ TLS on Docker socket (`dockerd --tlsverify`)  
+✓ SSH key-only, fail2ban on d12-gw  
+✓ `sysctl` hardening: `net.ipv4.ip_forward=1` only on d12-gw, disabled elsewhere  
+✓ Secrets via Docker secrets, NOT env-vars  
+✓ Weekly `dsym` offline snapshots of VM disks
+
+------------------------------------------------
+9.  Day-2 growth path
+------------------------------------------------
+• Add USB-NIC → LACP bond → 2 GbE for DSM + VMs.  
+• When CPU becomes bottleneck, migrate workers to bare-metal NUC; keep managers on NAS.  
+• Move NFS to dedicated SSD shelf via USB-C enclosure if I/O saturates.
+
+------------------------------------------------
+One-command bootstrap (after VMs exist)
+------------------------------------------------
+```bash
+# on d12-m1
+git clone https://github.com/you/swarm-lab
+cd swarm-lab
+./scripts/init.sh   # labels nodes, deploys observability stack
+```
+After 5 minutes you’ll have metrics, logs, and a resource dashboard.
+
+---
+
+Below is a concise, end-to-end walkthrough that takes you from “I have a fresh Linux VM” to “I have a 3-node Docker Swarm running a small demo service.”  
+Everything is 100 % CLI-based and works on Debian 12, Ubuntu 22/24, or Alpine with Docker ≥ 27.3 installed .
+
+--------------------------------------------------
+1.  Prerequisites
+--------------------------------------------------
+•  3 Linux hosts (1 manager + 2 workers) on the same L2/L3 network  
+•  Docker Engine installed and started on every host  
+•  TCP ports **2377**, **7946** (both TCP & UDP), and **4789** (UDP) open between hosts 
+
+--------------------------------------------------
+2.  Install Docker (example for Debian/Ubuntu)
+--------------------------------------------------
+```bash
+sudo apt update && sudo apt install -y docker.io
+sudo systemctl enable --now docker
+sudo usermod -aG docker $USER   # log out & back in
+```
+
+--------------------------------------------------
+3.  Initialize the swarm (on the manager)
+--------------------------------------------------
+```bash
+# Find the interface that other nodes can reach, e.g. 192.168.1.10
+docker swarm init --advertise-addr 192.168.1.10
+```
+You’ll receive a `docker swarm join` command with a token.  
+Example output:
+```
+Swarm initialized ...  
+To add a worker, run:
+  docker swarm join --token SWMTKN-1-xxxxx 192.168.1.10:2377
+```
+Save that token .
+
+--------------------------------------------------
+4.  Join the worker nodes
+--------------------------------------------------
+SSH to each worker and paste the join command:
+```bash
+docker swarm join --token <token> 192.168.1.10:2377
+```
+Back on the manager, verify:
+```bash
+docker node ls
+```
+You should see 3 nodes in “Ready/Active” state .
+
+--------------------------------------------------
+5.  Deploy your first service
+--------------------------------------------------
+Create a replicated nginx service (3 instances) and expose it on port 80:
+```bash
+docker service create \
+  --name web \
+  --replicas 3 \
+  --publish 80:80 \
+  nginx:latest
+```
+Check its status:
+```bash
+docker service ls
+docker service ps web
+```
+Swarm automatically spreads the 3 nginx containers across the three nodes .
+
+--------------------------------------------------
+6.  Scale or update with one command
+--------------------------------------------------
+Scale up:
+```bash
+docker service scale web=5
+```
+Rolling update to a new image:
+```bash
+docker service update --image nginx:1.25-alpine web
+```
+
+--------------------------------------------------
+7.  Optional: deploy with a stack file
+--------------------------------------------------
+Save as `demo-stack.yml`:
+```yaml
+version: "3.9"
+services:
+  web:
+    image: nginx:alpine
+    ports:
+      - "80:80"
+    deploy:
+      replicas: 3
+```
+Deploy:
+```bash
+docker stack deploy -c demo-stack.yml demo
+```
+You now have the same nginx cluster expressed declaratively .
+
+--------------------------------------------------
+8.  Day-2 commands you’ll use often
+--------------------------------------------------
+•  List nodes: `docker node ls`  
+•  Inspect a service: `docker service inspect --pretty web`  
+•  Drain a node (maintenance): `docker node update --availability drain <node>`  
+•  Remove a service: `docker service rm web`  
+•  Leave the swarm (worker): `docker swarm leave`  
+•  Tear down the whole swarm (manager): `docker swarm leave --force`
+
+--------------------------------------------------
+Key concepts in 30 seconds
+--------------------------------------------------
+Node       = Docker host joined to the swarm  
+Manager    = schedules tasks & keeps the swarm state  
+Worker     = runs the containers (tasks)  
+Service    = desired state for a set of containers (image, replicas, ports, etc.)  
+Task       = a single container instance scheduled by the swarm  
+Stack      = group of services defined in a Compose file 
+
+That’s all you need to get productive. Once you’re comfortable, add secrets, configs, health-checks, and multiple managers for HA.