DS-920+ 20 GB RAM + SSD-Cache: **Production-grade Docker-Swarm Lab Design** ──────────────────────────────────────────────────────────────────────────── High-level goal • 6 VMs (5 Swarm nodes + 1 gateway) • All-in-one on the NAS today, but architected so that **any** node can be migrated to bare metal later. • Observability stack is **first-class** (Prometheus, Loki, Grafana). • After bake-in period we **down-size RAM** per VM and rely on SSD-cache + ballooning. ------------------------------------------------ 1. Physical resource envelope ------------------------------------------------ CPU : 4 cores / 8 threads (J4125) RAM : 20 GB (16 GB upgrade + 4 GB stock) Storage : 4×HDD in SHR-1 + 2×NVMe read/write SSD cache (RAID-1) Network : 1×1 GbE (bond later if you add a USB-NIC) Hard limits • Max 4 vCPUs per VM to leave headroom for DSM. • Plan for **≤ 18 GB VM RAM total** so DSM + containers never swap to HDD. ------------------------------------------------ 2. VM map (initial “generous” sizing) ------------------------------------------------ | VM | vCPU | RAM (GB) | Disk | Role / Notes | |---------|------|----------|------|--------------| | d12-gw | 1 | 1 | 8 GB | Router, DNS, DHCP, jump box | | d12-m1 | 2 | 4 | 16 GB| Swarm mgr-1 + Prometheus | | d12-m2 | 2 | 4 | 16 GB| Swarm mgr-2 + Loki | | d12-m3 | 2 | 4 | 16 GB| Swarm mgr-3 + Grafana | | d12-w1 | 2 | 3 | 32 GB| Swarm worker-1 | | d12-w2 | 2 | 3 | 32 GB| Swarm worker-2 | TOTAL 11 19 GB ≈ 120 GB thin-provisioned Disk layout on NAS • All VMs on **SSD-cache-backed** volume (QoS = high). • Enable **“SSD cache advisor”** → pin VM disks (random I/O) into cache. ------------------------------------------------ 3. Network topology ------------------------------------------------ Virtual switches • **vs-lan** : 192.168.1.0/24 (eth0 on every VM) – upstream & mgmt. • **vs-swarm** : 10.10.10.0/24 (eth1 on every VM) – overlay & control-plane. Firewall rules on d12-gw • Forward/NAT only if you need Internet from vs-swarm. • SSH jump via d12-gw (port 2222 → internal 10.10.10.x:22). MTU tuning • vs-swarm MTU 1550 → Docker overlay MTU 1450 (leaves 100 bytes for VXLAN). ------------------------------------------------ 4. Storage layer ------------------------------------------------ Inside Swarm • Local-path-provisioner on each worker (fast SSD-cache) for stateless pods. • NFS share on d12-gw (`/srv/nfs`) exported to 10.10.10.0/24 for shared volumes (logs, Prometheus TSDB cold tier). Backup policy • DSM Snapshot Replication on VM folders nightly. • Off-site push via Hyper-Backup to cloud bucket. ------------------------------------------------ 5. Observability stack (first-class) ------------------------------------------------ Namespace: `monitoring` | Service | Placement | Resource limits | Notes | |---------|-----------|-----------------|-------| | Prometheus | mgr-1 | 1 vCPU, 2 GB RAM | 15-day retention, 10 GiB PVC | | Loki | mgr-2 | 1 vCPU, 1 GB RAM | 7-day retention, 5 GiB PVC | | Grafana | mgr-3 | 1 vCPU, 1 GB RAM | persistent `grafana.db` on NFS | | node-exporter | global (all 5) | 0.1 vCPU, 64 MB | host metrics | | cadvisor | global (all 5) | 0.2 vCPU, 128 MB | container metrics | | promtail | global (all 5) | 0.1 vCPU, 64 MB | forwards to Loki | Deploy via single compose file (`observability-stack.yml`) using `docker stack deploy`. ------------------------------------------------ 6. Swarm service placement rules ------------------------------------------------ ```yaml # managers only run control-plane containers docker node update --availability drain d12-m{1,2,3} constraints: - node.labels.role==worker # user workloads - node.labels.role==monitor # monitoring (mgr-*) ``` Label nodes: ```bash docker node update --label-add role=worker d12-w{1,2} docker node update --label-add role=monitor d12-m{1,2,3} ``` ------------------------------------------------ 7. Post-bake-in “rightsizing” schedule ------------------------------------------------ Monitor **Grafana → “VM Memory Utilization”** for 2 weeks. Typical safe cuts | VM | New RAM | Justification | |---------|---------|---------------| | d12-gw | 512 MB | Static routes + dnsmasq only | | d12-m{1,2,3} | 2 GB each | 1 GB OS + 1 GB Prometheus/Loki/Grafana | | d12-w{1,2} | 2 GB each | 1 GB OS + 1 GB workload burst | TOTAL after resize ≈ **10.5 GB** (leaves ~8 GB for DSM & SSD-cache buffers). Use **virtio-balloon** so DSM can reclaim unused RAM dynamically. ------------------------------------------------ 8. Security & hardening checklist ------------------------------------------------ ✓ TLS on Docker socket (`dockerd --tlsverify`) ✓ SSH key-only, fail2ban on d12-gw ✓ `sysctl` hardening: `net.ipv4.ip_forward=1` only on d12-gw, disabled elsewhere ✓ Secrets via Docker secrets, NOT env-vars ✓ Weekly `dsym` offline snapshots of VM disks ------------------------------------------------ 9. Day-2 growth path ------------------------------------------------ • Add USB-NIC → LACP bond → 2 GbE for DSM + VMs. • When CPU becomes bottleneck, migrate workers to bare-metal NUC; keep managers on NAS. • Move NFS to dedicated SSD shelf via USB-C enclosure if I/O saturates. ------------------------------------------------ One-command bootstrap (after VMs exist) ------------------------------------------------ ```bash # on d12-m1 git clone https://github.com/you/swarm-lab cd swarm-lab ./scripts/init.sh # labels nodes, deploys observability stack ``` After 5 minutes you’ll have metrics, logs, and a resource dashboard. --- Below is a concise, end-to-end walkthrough that takes you from “I have a fresh Linux VM” to “I have a 3-node Docker Swarm running a small demo service.” Everything is 100 % CLI-based and works on Debian 12, Ubuntu 22/24, or Alpine with Docker ≥ 27.3 installed . -------------------------------------------------- 1. Prerequisites -------------------------------------------------- • 3 Linux hosts (1 manager + 2 workers) on the same L2/L3 network • Docker Engine installed and started on every host • TCP ports **2377**, **7946** (both TCP & UDP), and **4789** (UDP) open between hosts -------------------------------------------------- 2. Install Docker (example for Debian/Ubuntu) -------------------------------------------------- ```bash sudo apt update && sudo apt install -y docker.io sudo systemctl enable --now docker sudo usermod -aG docker $USER # log out & back in ``` -------------------------------------------------- 3. Initialize the swarm (on the manager) -------------------------------------------------- ```bash # Find the interface that other nodes can reach, e.g. 192.168.1.10 docker swarm init --advertise-addr 192.168.1.10 ``` You’ll receive a `docker swarm join` command with a token. Example output: ``` Swarm initialized ... To add a worker, run: docker swarm join --token SWMTKN-1-xxxxx 192.168.1.10:2377 ``` Save that token . -------------------------------------------------- 4. Join the worker nodes -------------------------------------------------- SSH to each worker and paste the join command: ```bash docker swarm join --token 192.168.1.10:2377 ``` Back on the manager, verify: ```bash docker node ls ``` You should see 3 nodes in “Ready/Active” state . -------------------------------------------------- 5. Deploy your first service -------------------------------------------------- Create a replicated nginx service (3 instances) and expose it on port 80: ```bash docker service create \ --name web \ --replicas 3 \ --publish 80:80 \ nginx:latest ``` Check its status: ```bash docker service ls docker service ps web ``` Swarm automatically spreads the 3 nginx containers across the three nodes . -------------------------------------------------- 6. Scale or update with one command -------------------------------------------------- Scale up: ```bash docker service scale web=5 ``` Rolling update to a new image: ```bash docker service update --image nginx:1.25-alpine web ``` -------------------------------------------------- 7. Optional: deploy with a stack file -------------------------------------------------- Save as `demo-stack.yml`: ```yaml version: "3.9" services: web: image: nginx:alpine ports: - "80:80" deploy: replicas: 3 ``` Deploy: ```bash docker stack deploy -c demo-stack.yml demo ``` You now have the same nginx cluster expressed declaratively . -------------------------------------------------- 8. Day-2 commands you’ll use often -------------------------------------------------- • List nodes: `docker node ls` • Inspect a service: `docker service inspect --pretty web` • Drain a node (maintenance): `docker node update --availability drain ` • Remove a service: `docker service rm web` • Leave the swarm (worker): `docker swarm leave` • Tear down the whole swarm (manager): `docker swarm leave --force` -------------------------------------------------- Key concepts in 30 seconds -------------------------------------------------- Node = Docker host joined to the swarm Manager = schedules tasks & keeps the swarm state Worker = runs the containers (tasks) Service = desired state for a set of containers (image, replicas, ports, etc.) Task = a single container instance scheduled by the swarm Stack = group of services defined in a Compose file That’s all you need to get productive. Once you’re comfortable, add secrets, configs, health-checks, and multiple managers for HA.