From 48364f89706cb4cdada2b48caa730453c1cf993f Mon Sep 17 00:00:00 2001 From: medusa Date: Tue, 29 Jul 2025 13:24:34 -0500 Subject: [PATCH] Update tech_docs/networking/sdwan_primer.md --- tech_docs/networking/sdwan_primer.md | 934 ++++++++++++++++++++++++--- 1 file changed, 846 insertions(+), 88 deletions(-) diff --git a/tech_docs/networking/sdwan_primer.md b/tech_docs/networking/sdwan_primer.md index 01cd98b..d2510c1 100644 --- a/tech_docs/networking/sdwan_primer.md +++ b/tech_docs/networking/sdwan_primer.md @@ -1,125 +1,883 @@ -Here’s the **20% of SD-WAN that covers 80% of what you need to know**, with a focus on **practical knowledge for senior network roles** and seamless integration with your IPSec expertise: +Your collection of documents on SD-WAN is exceptionally thorough and well-structured. I've compiled them into a single, comprehensive draft, incorporating the refinements and addressing the inconsistencies we've discussed, particularly around VPN 0, VPN 512, and the Front-Door VRF concept. ---- +I've aimed to create a cohesive flow, starting with the "Top 1% Mindset" to set the stage, moving into the crash course for foundational understanding, then diving deep into TLOCs and the three planes, and finally, detailing key configurations and troubleshooting. + +Here is the complete draft: + +----- + +# Mastering SD-WAN: From Fundamentals to the Top 1% Mindset + +## The Top 1% Mindset + +You don’t just deploy SD-WAN—you **orchestrate** it. +You think in **abstractions** (colors, TLOCs, VPNs) not hardware. +You troubleshoot like a surgeon—control plane first, then data plane, then app logic. + +**Example:** +Problem: VoIP calls drop but O365 works. +**Top 1% Debug:** + + * Check BFD (control plane). + * Verify TLOC preferences (is LTE taking over incorrectly?). + * Inspect `app-route` policy (is VoIP pinned to MPLS but PfR overriding?). + * Drill into `show app-aware stats` (is jitter spiking on broadband?). + +**Final Thought** +Most SD-WAN "engineers" just click through vManage. The **real pros** know: + + * Transport independence isn’t automatic—it’s designed. + * Policies aren’t rules—they’re a logic flow. + * Troubleshooting isn’t guessing—it’s methodical dissection. + +You’re asking the right questions. Now go break (then fix) some TLOCs. 🚀 +(And yes, we both know Cisco’s docs don’t explain this stuff clearly—that’s why the top 1% reverse-engineer it.) + +----- + +## SD-WAN Crash Course: The 20% That Matters -### **SD-WAN Crash Course: The 20% That Matters** **Goal:** Understand **core SD-WAN concepts**, how they differ from traditional WAN, and how they integrate with IPSec. ---- +### 1\. SD-WAN vs Traditional WAN -## **1. SD-WAN vs Traditional WAN** -| **Feature** | **Traditional WAN (MPLS/VPN)** | **SD-WAN** | -|----------------------|-------------------------------|------------| -| **Cost** | Expensive (MPLS circuits) | Cheaper (uses Internet + broadband) | -| **Agility** | Manual config changes | Centralized, automated policies | -| **Performance** | Predictable but rigid | Dynamic path selection (jitter/loss-aware) | -| **Security** | Relies on IPSec/MPLS | Built-in encryption (IPSec, TLS) | -| **Topology** | Hub-and-spoke | Any-to-any, mesh | +| **Feature** | **Traditional WAN (MPLS/VPN)** | **SD-WAN** | +| :---------- | :----------------------------- | :--------- | +| **Cost** | Expensive (MPLS circuits) | Cheaper (uses Internet + broadband) | +| **Agility** | Manual config changes | Centralized, automated policies | +| **Performance** | Predictable but rigid | Dynamic path selection (jitter/loss-aware) | +| **Security** | Relies on IPSec/MPLS | Built-in encryption (IPSec, TLS) | +| **Topology** | Hub-and-spoke | Any-to-any, mesh | -**Key Takeaway:** -- SD-WAN **decouples control plane from hardware**, allowing dynamic traffic routing over **any transport (MPLS, LTE, broadband)**. +**Key Takeaway:** ---- + * SD-WAN **decouples control plane from hardware**, allowing dynamic traffic routing over **any transport (MPLS, LTE, broadband)**. -## **2. SD-WAN Core Components** -### **(1) Edge Devices (CPE)** -- **e.g., Cisco vEdge, FortiGate, VeloCloud** -- Sit at branch offices, apply policies, and encrypt traffic. +### 2\. SD-WAN Core Components -### **(2) Orchestrator (Controller)** -- **e.g., Cisco vManage, VMware Orchestrator** -- **Centralized policy management** (no CLI needed!). +**(1) Edge Devices (CPE)** -### **(3) Overlay Tunnels** -- **Encrypted tunnels** (IPSec, GRE, DTLS) between edges. -- Uses **TLOC (Transport Locator)** = Public IP + Color (e.g., `INET`, `MPLS`). + * e.g., Cisco vEdge, FortiGate, VeloCloud + * Sit at branch offices, apply policies, and encrypt traffic. -### **(4) Underlay Transport** -- **Any WAN link**: MPLS, Internet, LTE, 5G. +**(2) Orchestrator (Controller)** ---- + * e.g., Cisco vManage, VMware Orchestrator + * **Centralized policy management** (no CLI needed\!). -## **3. How SD-WAN Works (The 80% You Need)** -### **(1) Path Selection** -- **Dynamic multi-path steering**: Chooses best path based on: - - **Application SLA** (e.g., VoIP → low latency). - - **Real-time metrics** (jitter, packet loss, latency). +**(3) Overlay Tunnels** + + * **Encrypted tunnels** (IPSec, GRE, DTLS) between edges. + * Uses **TLOC (Transport Locator)** = Public IP + Color (e.g., `INET`, `MPLS`). + +**(4) Underlay Transport** + + * **Any WAN link**: MPLS, Internet, LTE, 5G. + +### 3\. How SD-WAN Works (The 80% You Need) + +**(1) Path Selection** + + * **Dynamic multi-path steering**: Chooses best path based on: + * **Application SLA** (e.g., VoIP → low latency). + * **Real-time metrics** (jitter, packet loss, latency). + +**Example Policy:** -**Example Policy:** ```plaintext -IF (Application == VoIP) AND (Latency > 50ms) → SWITCH to backup link +IF (Application == VoIP) AND (Latency > 50ms) → SWITCH to backup link ``` -### **(2) Zero-Touch Provisioning (ZTP)** -- Plug in a device → auto-configures via orchestrator. +**(2) Zero-Touch Provisioning (ZTP)** -### **(3) Application-Aware Routing** -- **DPI (Deep Packet Inspection)** identifies apps (e.g., Teams, SAP). -- **QoS prioritization** (VoIP > YouTube). + * Plug in a device → auto-configures via orchestrator. -### **(4) Security Integration** -- **IPSec for all overlays** (mandatory for Internet links). -- **Cloud-based firewalls** (e.g., FortiGate, Zscaler). +**(3) Application-Aware Routing** ---- + * **DPI (Deep Packet Inspection)** identifies apps (e.g., Teams, SAP). + * *(Note: While effective, some advanced encryption like TLS 1.3 can limit DPI's visibility, requiring IP-based fallbacks.)* + * **QoS prioritization** (VoIP \> YouTube). -## **4. SD-WAN + IPSec Integration** -- **SD-WAN uses IPSec for secure tunnels** but adds: - - **Automated key rotation** (no manual PSK updates). - - **Tunnel bonding** (combines multiple links for throughput). +**(4) Security Integration** -**Key Difference:** -- Traditional IPSec VPN = **static tunnels**. -- SD-WAN IPSec = **dynamic, SLA-driven tunnels**. + * **IPSec for all overlays** (mandatory for Internet links). + * **Cloud-based firewalls** (e.g., FortiGate, Zscaler). ---- +### 4\. SD-WAN + IPSec Integration -## **5. SD-WAN Troubleshooting (Top 5 Issues)** -| **Issue** | **Debug Command** | **Fix** | -|-------------------------------|--------------------------------------|---------| -| **Tunnels not coming up** | `show sdwan tunnel` (Cisco) | Check underlay reachability | -| **Poor VoIP quality** | `show sdwan app-route stats` | Adjust SLA thresholds | -| **Orchestrator sync failure** | `show sdwan control connections` | Verify certs/connectivity | -| **Traffic taking wrong path** | `show sdwan policy-service-path` | Fix application-aware rules | -| **High latency on backup** | `show sdwan interface` | Enable FEC (Forward Error Correction) | + * **SD-WAN uses IPSec for secure tunnels** but adds: + * **Automated key rotation** (no manual PSK updates). + * **Tunnel bonding** (combines multiple links for throughput). ---- +**Key Difference:** -## **6. SD-WAN vs. DMVPN (Common Interview Qs)** -**Q: When would you use SD-WAN over DMVPN?** -- **SD-WAN**: When you need **application-aware routing + centralized management**. -- **DMVPN**: When you need **scalable IPSec tunnels but don’t need SaaS optimization**. + * Traditional IPSec VPN = **static tunnels**. + * SD-WAN IPSec = **dynamic, SLA-driven tunnels**. -**Q: Can SD-WAN replace IPSec?** -- **No!** SD-WAN **uses** IPSec for encryption but adds intelligence on top. +### 5\. SD-WAN Troubleshooting (Top 5 Issues) ---- +| **Issue** | **Debug Command** | **Fix** | +| :-------- | :---------------- | :------ | +| **Tunnels not coming up** | `show sdwan tunnel` (Cisco) | Check underlay reachability | +| **Poor VoIP quality** | `show sdwan app-route stats` | Adjust SLA thresholds | +| **Orchestrator sync failure** | `show sdwan control connections` | Verify certs/connectivity | +| **Traffic taking wrong path** | `show sdwan policy-service-path` | Fix application-aware rules | +| **High latency on backup** | `show sdwan interface` | Enable FEC (Forward Error Correction) | -## **7. Lab Practice (Quick Wins)** -1. **Simulate link failure** in GNS3/EVE-NG → Watch SD-WAN switch paths. -2. **Prioritize VoIP traffic** over YouTube. -3. **Break the orchestrator** → Observe fallback to local policies. +### 6\. SD-WAN vs. DMVPN (Common Interview Qs) + +**Q: When would you use SD-WAN over DMVPN?** + + * **SD-WAN**: When you need **application-aware routing + centralized management**. + * **DMVPN**: When you need **scalable IPSec tunnels but don’t need SaaS optimization**. + +**Q: Can SD-WAN replace IPSec?** + + * **No\!** SD-WAN **uses** IPSec for encryption but adds intelligence on top. + +### 7\. Lab Practice (Quick Wins) + +1. **Simulate link failure** in GNS3/EVE-NG → Watch SD-WAN switch paths. +2. **Prioritize VoIP traffic** over YouTube. +3. **Break the orchestrator** → Observe fallback to local policies. + +**CLI Examples (Cisco Viptela):** -**CLI Examples (Cisco Viptela):** ```bash -show sdwan control connections # Check orchestrator status -show sdwan app-route stats # Verify path selection -clear sdwan tunnel # Force tunnel re-establishment +show sdwan control connections # Check orchestrator status +show sdwan app-route stats # Verify path selection +clear sdwan tunnel # Force tunnel re-establishment ``` ---- +### 8\. Interview Cheat Sheet -## **8. Interview Cheat Sheet** -✅ **SD-WAN = Automation + Application-Aware Routing + Multiple Underlays**. -✅ **IPSec is still used, but dynamically managed**. -✅ **Key metrics: Jitter (<30ms), Latency (<150ms), Packet Loss (<1%)**. -✅ **Orchestrator is the brain; edges are the muscle**. + * ✅ **SD-WAN = Automation + Application-Aware Routing + Multiple Underlays**. + * ✅ **IPSec is still used, but dynamically managed**. + * ✅ **Key metrics: Jitter (\<30ms), Latency (\<150ms), Packet Loss (\<1%)**. + * ✅ **Orchestrator is the brain; edges are the muscle**. ---- +----- -### **Where to Go Next?** -1. **Deep dive into your vendor’s SD-WAN** (Cisco, Fortinet, VMware). -2. **Learn cloud-integrated SD-WAN** (AWS Transit Gateway, Azure Virtual WAN). -3. **Study real-world designs** (e.g., "How SD-WAN replaces MPLS"). +## The Three Planes of SD-WAN & Modern Networking -Need a **deep dive on a specific SD-WAN vendor** or **mock scenarios**? Let me know! 🚀 \ No newline at end of file +In modern networking, especially with overlay technologies like SD-WAN, we deal with **three distinct planes**, each serving a critical role. + +### 1\. Management Plane + + * **Purpose:** Controls **device access and monitoring** (SSH, SNMP, HTTPS, syslog, etc.). It's about how you *interact with* the device. + * **Key Components:** + | Component | Protocol | Port | Description | + | :-------- | :------- | :--- | :---------- | + | **vManage** | HTTPS (WebUI) | TCP/443 | GUI/API for centralized control and configuration. | + | **vBond** | DTLS | UDP/23456 | Orchestrator for device authentication and initial redirection to vManage. | + | **Zero-Touch Provisioning (ZTP)** | DHCP/HTTPS | - | Auto-configures devices out-of-the-box. | + * **Traffic Flow:** + 1. **Onboarding:** Device contacts **vBond** (DTLS) → gets redirected to **vManage**. Downloads config/CSR via **HTTPS**. + 2. **Ongoing Management:** Devices send **telemetry** (metrics, logs) to vManage. Policies (security, routing) are pushed **from vManage**. + * **Security Considerations:** + * ✔ **Always use isolated VRFs for management traffic** (e.g., traditional FVRF, or VPN 512 in SD-WAN for OOB management). + * ✔ **Mutual TLS (mTLS)** for device-vManage communication. + * ✔ **Role-Based Access Control (RBAC)** in vManage. + +### 2\. Control Plane + + * **Purpose:** Handles **protocols that build network intelligence** (BGP, OSPF, VXLAN EVPN, SD-WAN OMP, STP, LACP, etc.). It's about how the network *learns* its topology and reachability. + * **Key Protocols (SD-WAN Specific):** + | Protocol | Function | Port | + | :------- | :------- | :--- | + | **OMP (Overlay Management Protocol)** | Advertises routes, TLOCs, policies. | DTLS/UDP/40322 | + | **BGP (optional)** | Legacy WAN integration or underlay routing. | TCP/179 | + | **TLOC (Transport Locator)** | Maps physical WAN links to logical tunnels for policy application. | - | + * **How OMP Works:** + 1. **vSmart controllers** act as **route reflectors** for OMP. + 2. **Edge devices (vEdges)** send: + * **Routes** (prefixes learned from LAN/WAN). + * **TLOCs** (tunnel endpoints, e.g., `public-IP:color`). + * **Policies** (e.g., "prefer MPLS for VoIP"). + 3. **vSmart redistributes** this info to all edges. + * **Example OMP Route Advertisement:** + ```bash + vEdge# show omp routes + RECEIVED ROUTES: + Prefix TLOC IP Color Preference + 10.1.1.0/24 203.0.113.1 mpls 100 + 10.1.1.0/24 198.51.100.1 biz-internet 50 + ``` + *(MPLS is preferred over Internet due to higher preference.)* + * **Key Traits:** + * **Distributes reachability info** (routes, tunnels, topology). + * Runs on the **CPU (software-based)** and is vulnerable to floods (e.g., BGP attacks). + * Can be placed in a **separate VRF** (but not a traditional FVRF which is management-only). + * **Security Considerations:** + * ✔ **DTLS encryption** for OMP (no cleartext control traffic\!). + * ✔ **Control-plane policing (CoPP)** to prevent floods. + * ✔ **Private WAN links (MPLS)** for critical control traffic. + +### 3\. Data Plane (Forwarding Plane) + + * **Purpose:** **Moves user traffic** (packets/frames) at **line rate (hardware-accelerated)**. It's about *moving* the actual data. + * **Key Technologies:** + | Technology | Role | + | :--------- | :--- | + | **IPsec/GRE** | Encrypted tunnels between edges. | + | **TLOC (Transport Locator)** | Logical tunnel endpoint (e.g., `public-IP:color`). | + | **Application-Aware Routing (AAR)** | Dynamically switches paths based on SLA. | + * **Data Flow Example:** + 1. **Traffic arrives at vEdge:** + * Classified via **DPI (Deep Packet Inspection)**. + * Tagged with **QoS markings (DSCP)**. + 2. **Path Selection:** + * Checks **OMP-learned TLOCs** and **SLA metrics**. + * Chooses best path (e.g., MPLS for VoIP, Internet for web). + 3. **Encapsulation:** + * Wrapped in **IPsec (ESP/AH)** or **GRE**. + * Sent to peer vEdge via **WAN (MPLS/Internet/5G)**. + * **Packet Walkthrough (Simplified):** + 1. **Original Packet:** + ``` + SRC: 10.1.1.100 (LAN) | DST: 8.8.8.8 (Internet) + ``` + 2. **After SD-WAN Processing:** + ``` + [IPsec][GRE][SD-WAN Header][Original Packet] + SRC: 203.0.113.1 (vEdge Public IP) + DST: 198.51.100.2 (Peer vEdge Public IP) + ``` + * **Key Traits:** + * **ASIC/switch-chip driven** (not CPU). + * **Doesn’t care about routes/tunnels**—just forwards based on FIB/TCAM. + * **Security Considerations:** + * ✔ **IPsec (AES-256-GCM, IKEv2)** for all tunnels. + * ✔ **Zone-Based Firewall** on vEdges. + * ✔ **SLA-based DDoS protection** (drop jitter/lossy links). + +### Why This Separation Matters + +| Plane | Runs On | Isolation Needed? | Risks if Compromised | +| :---- | :------ | :---------------- | :------------------- | +| **Management** | CPU | **Yes (Dedicated VRF/OOB)** | Total device takeover | +| **Control** | CPU | **Yes (VRF/CoPP)** | Network meltdown (BGP hijacks, loops) | +| **Data** | ASIC | **No (but ACLs help)** | Performance drops (DDoS), but no config access | + +### Common Misconceptions + +1. **"Control Plane = Management Plane"** → **No\!** + * **Control Plane:** BGP, OSPF, VXLAN EVPN. + * **Management Plane:** SSH, SNMP. + * *(They’re both CPU-based but serve different purposes.)* +2. **"A traditional FVRF can carry BGP/VXLAN"** → **No\!** + * Traditional FVRF (Front-Door VRF) is **only for management** traffic, isolated from data/control. + * BGP/VXLAN go in **normal VRFs** or a dedicated control-plane VRF. +3. **"Data Plane Needs a VRF"** → **Usually No.** + * Data traffic follows the **FIB** (built by the control plane). + * VRFs for data are typically for **tenant isolation** (e.g., MPLS VPNs, multi-tenancy service VPNs in SD-WAN). + +### Real-World Use Cases + +1. **SD-WAN** + * **Management:** vManage (HTTPS). + * **Control:** OMP (Overlay Management Protocol). + * **Data:** Encrypted tunnels (IPsec/GRE). +2. **VXLAN EVPN** + * **Management:** SSH to switches. + * **Control:** BGP EVPN (MAC/IP routing). + * **Data:** VXLAN-encapsulated traffic. +3. **Service Provider MPLS** + * **Management:** TACACS+ for routers. + * **Control:** LDP/RSVP (label distribution). + * **Data:** Label-switched packets. + +### Key Takeaways + +1. **Management Plane** = Your **remote admin access** (dedicated VRF/OOB). +2. **Control Plane** = **Protocols that build the network** (BGP, EVPN, OSPF, OMP). +3. **Data Plane** = **Raw packet forwarding** (ASIC-driven, no intelligence). + +### Final Thought + +The industry’s failure to **physically separate all three planes** (like servers do with iLO) is a security flaw. But until vendors fix it: + + * **Isolate management traffic in dedicated VRFs (like a traditional FVRF or SD-WAN's VPN 512 for OOB).** + * **Use VRFs/CoPP for control-plane isolation and protection.** + * **Trust ASICs for the data plane.** + +----- + +## Deep Dive: TLOCs (Transport Locators) – The Spine of SD-WAN + +TLOCs are the **make-or-break abstraction** in SD-WAN architectures (especially Cisco Viptela). They’re the glue between the underlay (physical links) and overlay (logical policies). But most engineers only *think* they understand them. Let’s fix that. + +### 1\. TLOCs: The Core Concept + +A **TLOC** is a **logical representation** of a WAN edge router’s transport connection. It’s defined by three key attributes: + + * **TLOC IP** (the physical interface IP). + * **Color** (e.g., `mpls`, `biz-internet`, `lte`). + * **Encapsulation** (IPsec or TLS). + +**Why this matters:** + + * TLOCs **decouple policies from hardware**. You can swap circuits (e.g., change ISP) without rewriting all your rules. + * They enable **transport-independent routing**—policies reference colors, not IPs. + +### 2\. TLOC Components – What’s Under the Hood + +#### A. TLOC Extended Attributes + +These are **hidden knobs** that influence path selection: + + * **Preference** (like admin distance – higher = better). + * **Weight** (for load-balancing across equal paths). + * **Public/Private IP** (for NAT traversal). + * **Site-ID** (prevents misrouting in multi-tenant setups). + +**Example:** + +```plaintext +tloc-extension { + ip = 203.0.113.1 + color = biz-internet + encap = ipsec + preference = 100 # Higher = more preferred +} +``` + +#### B. TLOC Groups + + * **Primary/Backup Groups:** Force deterministic failover (e.g., "Use LTE only if MPLS is down"). + * **Geographic Groups:** Steer traffic regionally (e.g., "EU branches prefer EU-based TLOCs"). + +**Pro Tip:** Misconfigured groups cause **asymmetric routing**—always validate with `show sdwan tloc`. + +### 3\. TLOC Lifecycle – How They’re Born, Live, and Die + +#### A. TLOC Formation + + * **Discovery:** Router advertises its TLOCs via OMP (Overlay Management Protocol). + * **Validation:** BFD (Bidirectional Forwarding Detection) confirms reachability. + * **Installation:** TLOC enters the RIB (Routing Information Base) if valid. + +**Critical Check:** + + * `show sdwan omp tlocs` \# Verify TLOC advertisements + * `show sdwan bfd sessions` \# Confirm liveliness + +#### B. TLOC States + + * **Up/Active:** BFD is healthy, traffic can flow. + * **Down/Dead:** BFD failed, TLOC is pulled from RIB. + * **Partial:** One direction works (asymmetric routing risk\!). + +**Debugging:** + + * `show sdwan tloc | include Partial` \# Hunt for flapping TLOCs + +### 4\. TLOC Policies – The Real Power + +#### A. Influencing Path Selection + + * **Route Policy:** Modify TLOC preferences per-application. + ```plaintext + apply-policy { + app-route voip { + tloc = mpls preference 200 # Always prefer MPLS for VoIP + }} + ``` + * **Smart TLOC Preemption:** Fail back aggressively (or not). + +#### B. TLOC Affinity + + * **Sticky TLOCs:** Pin flows to a TLOC (e.g., for SIP trunks). + * **Load-Balancing:** Distribute across TLOCs with equal weight. + +**Gotcha:** Affinity conflicts with **Performance Routing (PfR)**—tune carefully\! + +### 5\. TLOC Troubleshooting – The Dark Arts + +#### A. Common TLOC Failures + + * **BFD Flapping** → TLOCs bounce. + * Fix: Adjust BFD timers (`bfd-timer 300 900 3`). (Hello interval 300ms, Multiplier 3) + * **Color Mismatch** → TLOCs don’t form. + * Fix: Ensure colors match exactly (case-sensitive\!). + * **NAT Issues** → Private IP leaks. + * Fix: Use `tloc-extension public-ip`. + +#### B. Advanced Debugging + + * `debug sdwan omp tlocs` \# Watch TLOC advertisements in real-time + * `debug sdwan bfd events` \# Catch BFD failures + * `show sdwan tloc-history` \# Track TLOC changes over time + +### 6\. TLOC vs. The World + +| Concept | TLOC | Traditional WAN Addressing | +| :------ | :--- | :------------------------- | +| **Addressing** | Logical (color-based) | Physical (IP-based) | +| **Failover** | Sub-second (BFD + OMP) | Slow (BGP convergence) | +| **Policies** | Transport-agnostic | Hardcoded to interfaces | + +**Key Takeaway:** TLOCs turn **network plumbing** into **policy-driven intent**. + +**Final Word** +Mastering TLOCs means: + + * ✅ You **never** blame "the SD-WAN" for routing issues—you dissect TLOC states. + * ✅ You **design for intent** (colors, groups) instead of hacking interface configs. + * ✅ You **troubleshoot like a surgeon**—OMP → BFD → TLOC → Policy. + +Now go forth and make TLOCs obey. 🚀 +(And when Cisco TAC says "it’s a TLOC issue," you’ll know exactly where to look.) + +----- + +## SD-WAN Site ID + Color + Management Subnet Integration Guide + +To build a **scalable, intuitive, and operationally efficient** SD-WAN fabric, we’ll combine: + +1. **Site IDs** (Logical location identifiers) +2. **Colors** (Underlay transport identification) +3. **Management Subnet** (VRF for OOB/In-band management) + +Here’s how to plan and implement them cohesively: + +### 1\. Hierarchy & Assignment Strategy + +#### A. Site ID + Color + Management Subnet Relationship + +| Component | Purpose | Example Value | Design Tip | +| :-------- | :------ | :------------ | :--------- | +| **Site ID** | Uniquely identifies a branch/DC | `100` (HQ), `200` (Branch) | Use geographic encoding (e.g., `1` = Americas). | +| **Color** | Identifies WAN transport types | `mpls`, `internet`, `lte` | Match colors to ISP/underlay (e.g., `verizon_mpls`). | +| **Mgmt Subnet** | Dedicated subnet for OOB/In-band mgmt | `10.255.100.0/24` (VPN 0 or VPN 512) | Isolate from data VPNs (1-511). | + +#### B. Structured Numbering Example + +**Scenario**: A multinational with: + + * **Region 1 (Americas)**: MPLS + Internet + * **Region 2 (EMEA)**: MPLS + LTE + +| Site | Site ID | System IP | Colors (Transport) | Management Subnet | +| :--- | :------ | :-------- | :----------------- | :---------------- | +| **HQ (Dallas)** | `100` | `172.16.100.1` | `mpls_blue`, `biz_internet` | `10.255.100.0/24` (VPN 0) | +| **Branch (NY)** | `101` | `172.16.101.1` | `mpls_blue`, `biz_internet` | `10.255.101.0/24` (VPN 0) | +| **DC (Frankfurt)** | `200` | `172.16.200.1` | `europe_mpls`, `lte_backup` | `10.255.200.0/24` (VPN 0) | + +### 2\. Color Planning Best Practices + +#### A. Standardize Color Naming + + * Use **descriptive, consistent names**: + ```plaintext + _ (e.g., `att_mpls`, `comcast_biz_internet`) + ``` + * Avoid generic names like `primary`, `secondary` (confusing at scale). + +#### B. Color Redundancy Rules + + * Assign **at least 2 colors per site** (e.g., `mpls` + `internet`). + * Use **BFD** for fast failover between colors. + +#### C. Color Mapping to TLOCs + + * Each **color** corresponds to a **TLOC** (Transport Locator). + * Example TLOC config: + ```bash + vEdge(config)# vpn 0 interface ge0/0 + tunnel-interface + color mpls restrict # Restrict to MPLS underlay + ``` + +### 3\. Management Subnet Strategy + +#### A. Key Requirements + + * **Isolation**: Management traffic should be isolated. + * **In-band Management:** Typically resides in **VPN 0** (shares the transport VRF with control/data overlay traffic but is logically separate). + * **Out-of-Band (OOB) Management:** For dedicated management ports (e.g., `GigabitEthernet0/0` on a vEdge), use **VPN 512**. Routes in VPN 512 are **NOT** advertised into the OMP overlay. + * **Subnet Size**: `/24` recommended (supports up to 254 devices). + +#### B. Addressing Scheme Example + +For **In-band Management (VPN 0)**: + +```plaintext +10.255..0/24 +Example: +- Site ID 100 → `10.255.100.0/24` +- Site ID 200 → `10.255.200.0/24` +``` + +For **Out-of-Band Management (VPN 512)**: Use a completely separate, non-overlapping management subnet, typically on a dedicated physical interface. + +**Benefits**: + + * Predictable IPs (easy troubleshooting). + * No overlaps with service VPNs. + +#### C. vManage Integration + + * Define management subnets in **vManage Templates**: + ```bash + device vpn 0 + interface eth0 + ip address 10.255.100.1/24 + tunnel-interface + color biz_internet restrict + ``` + (For VPN 512, you'd configure a separate interface under `device vpn 512`). + +### 4\. Putting It All Together: Design Checklist + +1. **Site IDs**: Geographic/role-based, unique, documented in IPAM. +2. **Colors**: Named after carriers, assigned to TLOCs, redundant. +3. **Management Subnet**: + * `/24` in VPN 0 for in-band. + * `/24` in VPN 512 for OOB (preferred for dedicated management ports). +4. **System IPs**: Align with Site ID (e.g., Site ID `100` → `172.16.100.1`). + +### 5\. Common Pitfalls + +❌ **Color Conflicts**: Reusing `mpls` for different ISPs (use `att_mpls`, `verizon_mpls`). +❌ **Mgmt Overlaps**: Sharing `10.255.100.0/24` across sites (always subnet per site). +❌ **Unstructured Site IDs**: Random numbers (hard to scale beyond 50 sites). +❌ **Incorrect VPN for Internet Breakout**: Using VPN 512 for DIA (it's for OOB management). DIA should be in a service VPN or VPN 0. + +### Final Topology Example + +```plaintext +Site ID: 100 (Dallas HQ) +- System IP: 172.16.100.1 +- Colors: mpls_blue, biz_internet +- Mgmt Subnet: 10.255.100.0/24 (VPN 0 for in-band) +- Service VPNs: 10 (LAN), 20 (VoIP) +``` + +----- + +## SD-WAN Fabric Bring-Up Essentials + +To **bring up an SD-WAN fabric**, you need to configure key components correctly. Below is a **concise, step-by-step breakdown** of the essentials, along with **critical design considerations**. + +### 1\. Underlay Network (VPN 0 - Transport VRF / Front-Door VRF) + + * **Purpose**: Handles **control-plane traffic** (OMP, DTLS/TLS tunnels between devices) and **encapsulated data-plane traffic**. All physical WAN interfaces that connect to the underlay belong to VPN 0. + * **Key Configurations**: + * **Interfaces**: Assign WAN interfaces (e.g., MPLS, Internet, LTE) to VPN 0. + * **Routing**: + * Static routes (for simple setups). + * BGP/OSPF (for dynamic underlay routing in larger deployments). + * **TLOC Extensions**: Define public/private IPs for tunnel endpoints, along with colors. + * **Design Considerations**: + * **Dual Underlay**: Use at least **two transport types** (e.g., MPLS + Internet) for redundancy. + * **TLOC Preference**: Prioritize cheaper/faster links (e.g., MPLS over LTE). + +### 2\. Overlay Network (OMP Routing) + + * **Purpose**: Distributes routes and policies across the fabric. + * **Key Configurations**: + * **OMP (Overlay Management Protocol)**: Advertises routes, TLOCs, and policies between vSmart controllers and edges. + * **Route Policies**: Control which prefixes are shared (e.g., only corporate LAN routes). + * **Design Considerations**: + * **Route Aggregation**: Minimize prefixes advertised to vSmart (e.g., summarize branch LANs). + * **TLOC Redundancy**: Assign multiple TLOCs per route for failover. + +### 3\. Service VPNs (VPN 1-511) + + * **Purpose**: Segments user/data traffic (e.g., corporate LAN, guest Wi-Fi, VoIP). + * **Key Configurations**: + * **VRF Creation**: Define VPNs (e.g., `vpn 10` for corporate LAN). + * **Interface Assignment**: Assign LAN interfaces to the correct VPN. + * **Route Leaking**: If needed, allow controlled traffic flow between VPNs (via centralized policies). + * **Design Considerations**: + * **QoS Tagging**: Apply DSCP markings per VPN (e.g., EF for VoIP in `vpn 20`). + * **Security Policies**: Restrict inter-VPN communication (e.g., guest Wi-Fi in `vpn 30` can’t reach `vpn 10`). + +### 4\. Internet Breakout + + * **Purpose**: Local internet access (DIA) from branches or centralized internet access from a datacenter. + * **Key Configurations**: + * **NAT & Firewall**: Enable NAT overload (PAT) for private→public IP translation on the egress interface. + * **Policy-Based Routing (PBR) or Application-Aware Routing**: Steer specific traffic (e.g., SaaS apps, guest Wi-Fi) to the local internet path. + * **Design Considerations**: + * **Security**: Apply **ZTNA/Umbrella** or other security services for secure internet access. + * **Backup Path**: If local DIA fails, fall back to centralized internet via the overlay. + * **Note**: This is typically configured in a **service VPN (e.g., VPN 10, or a dedicated internet VPN like VPN 999)**, or by routing traffic directly out a VPN 0 interface with specific policies and NAT. **VPN 512 is reserved for Out-of-Band Management, not Internet Breakout.** + +### 5\. Management & Control Plane Connectivity + + * **Purpose**: Ensures vEdges can securely connect to controllers (vManage, vSmart, vBond). + * **Key Configurations**: + * **Controller IPs**: Ensure vEdges can reach vManage/vSmart/vBond over VPN 0. + * **Certificate Auth**: Use device certificates for secure onboarding. + * **Design Considerations**: + * **Out-of-Band (OOB) Management (VPN 512)**: Use a separate OOB network with interfaces in VPN 512 for high availability and isolation of management traffic from the overlay. + * **Geo-Redundancy**: Deploy controllers in multiple regions. + +### 6\. Security Policies + + * **Purpose**: Enforce traffic rules (e.g., blocking, inspection). + * **Key Configurations**: + * **Zone-Based Firewall**: Assign interfaces to zones (e.g., "inside," "outside"). + * **Application-Aware Policies**: Block high-risk apps (e.g., Tor, Netflix). + * **Design Considerations**: + * **Default-Deny**: Start with "deny all," then allow only needed traffic. + * **IPS/IDS**: Enable for internet-bound traffic. + +### 7\. High Availability (HA) + + * **Design Considerations**: + * **Dual vSmarts**: Avoid single points of failure for the control plane. + * **Active/Standby Edges**: Use VRRP/HSRP for LAN-side HA at critical sites. + * **Cloud Gateway Redundancy**: For cloud-onramp (e.g., AWS/Azure). + +### Summary Checklist + +| **Step** | **Action** | **Critical Design Tip** | +| :------- | :--------- | :---------------------- | +| **1. Underlay** | Configure VPN 0 interfaces & routing | Dual transports (MPLS + Internet) | +| **2. Overlay** | Set up OMP & route policies | Summarize routes to reduce overhead | +| **3. Service VPNs** | Define VPNs 1-511 & assign interfaces | Use QoS for VoIP/VC traffic | +| **4. Internet** | Configure DIA in a Service VPN or VPN 0 | Add ZTNA/umbrella for security | +| **5. Management** | Ensure controllers are reachable via VPN 0 | OOB management (VPN 512) for resiliency | +| **6. Security** | Apply firewall/IPS policies | Default-deny approach | +| **7. HA** | Deploy redundant controllers/edges | Active/standby for critical sites | + +----- + +## SD-WAN Application-Aware Routing (AAR) with `match app-list` + +*Control traffic flows based on applications using vManage policies.* + +### 1\. What is `match app-list`? + + * **Purpose:** Identifies specific applications (e.g., Zoom, Netflix, VoIP) to steer traffic via policies. + * **Use Cases:** + * Prioritize VoIP over MPLS. + * Block high-risk apps (e.g., Tor). + * Local internet breakout (DIA) for SaaS apps. + +### 2\. How It Works + +1. **Application Detection:** + * Uses **Deep Packet Inspection (DPI)** to identify apps (even if ports are encrypted). + * Predefined app lists in vManage (e.g., `VOICE-AND-VIDEO`, `BUSINESS-APPS`). +2. **Policy Matching:** + * Policies reference `app-list` to trigger actions (e.g., change path, apply QoS). + +### 3\. Configuration Steps + +#### 3.1 Define an App List in vManage + +1. Navigate to: **Configuration \> Policies \> Custom Options \> App-Aware Routing** +2. Create a new app list: + ```plaintext + Name: CORPORATE-APPS + Applications: + - Microsoft-365 + - Webex-Teams + - Zoom-Cloud + ``` + +#### 3.2 Create a Policy Using `match app-list` + +**Example:** *"Route Microsoft-365 traffic via VPN 10 (local internet breakout)"* +*(Note: VPN 512 is for Out-of-Band Management, not Internet Breakout. Use a service VPN like VPN 10 or route out VPN 0 for DIA.)* + +```bash +policy-rule MICROSOFT-365-DIA + match app-list CORPORATE-APPS # Match predefined apps + action accept + set vpn 10 # Force local internet breakout via VPN 10 + set dscp 46 # Mark for QoS (EF) +``` + +#### 3.3 Apply Policy to Sites + +1. Attach policy to a **Centralized Policy** in vManage. +2. Push to target sites. + +### 4\. Best Practices + +#### 4.1 App List Design + + * **Group logically:** + * `VOICE-AND-VIDEO`: Zoom, Webex, MS-Teams. + * `BUSINESS-CRITICAL`: SAP, Oracle, Salesforce. + * **Avoid overly broad lists** (e.g., "ALL-WEB") to prevent unintended matches. + +#### 4.2 Policy Ordering + + * **Higher priority** (lower number) policies evaluate first. + ```bash + policy-list AAR-POLICY + sequence 10 + match app-list VOICE-AND-VIDEO + action accept + set color mpls # Force MPLS for voice + sequence 20 + match app-list NETFLIX + action drop # Block Netflix + ``` + +#### 4.3 SLA-Based Fallback + + * Combine with **Performance Routing (PfR)** to switch paths if SLA fails: + ```bash + match app-list WEBEX + action accept + set sla preferred-color mpls latency 100ms + ``` + +### 5\. Verification & Troubleshooting + +#### 5.1 Key Commands + +| Command | Purpose | +| :------ | :------ | +| `show sdwan app-aware stats` | Lists detected apps and paths. | +| `show sdwan policy service-statistics` | Checks policy hits. | +| `show sdwan app-fwd dpi flows` | Inspects DPI-classified flows. | + +#### 5.2 Common Issues + +| Symptom | Likely Cause | Fix | +| :------ | :----------- | :-- | +| App traffic not matching | Incorrect app-list definition | Verify app names in vManage. | +| Policy not applying | Wrong policy priority | Reorder policies (lower sequence = higher priority). | +| DPI not detecting apps | Encryption (TLS 1.3) | Use IP-based matching as fallback. | + +### 6\. Advanced Use Cases + +#### 6.1 Custom DPI Signatures + + * For proprietary apps, add custom signatures: + ```bash + app-list CUSTOM-APP + signature TCP port 5000 protocol HTTP user-agent "MyApp*" + ``` + +#### 6.2 Combining with QoS + + * Mark apps for prioritization: + ```bash + match app-list VOICE + action accept + set dscp ef # Expedited Forwarding (VoIP) + ``` + +#### 6.3 Internet Breakout for Specific Apps + +```bash +match app-list SALESFORCE +action accept +set vpn 10 # Local breakout via VPN 10 +set nat use-vpn 0 # Use VPN 0's NAT pool (if VPN 0 is internet-facing) +``` + +### 7\. Summary Checklist + + * [ ] Define app lists in vManage (**Configuration \> Policies \> App-Aware Routing**). + * [ ] Use `match app-list` in policies to steer traffic. + * [ ] Test with `show sdwan app-aware stats`. + * [ ] Combine with SLA for dynamic failover. + +### Key Takeaways + +1. **`match app-list` enables application-aware routing** (not just IP/port-based). +2. **DPI visibility can be affected by strong encryption** (e.g., TLS 1.3 with ESNI) → May need fallback to IP-based matching. +3. **Policy order matters** — Highest priority (lowest sequence) evaluates first. + +----- + +## Front-Door VRF (FVRF) Explained (Using Cisco Gear) + +**Front-Door VRF (FVRF)** is a Cisco feature that enhances security by separating the **management plane** from the **data plane** in network devices (routers, switches, firewalls). It achieves this by placing the management interface (SSH, SNMP, HTTPS, etc.) in a separate Virtual Routing and Forwarding (VRF) instance, isolating it from the default global routing table. + +**Note:** While this document describes the general concept of Front-Door VRF in Cisco devices, in Cisco SD-WAN (Viptela-based) architectures: + + * **VPN 0** is often referred to as the "Front-Door VRF" in the sense that it is the transport VRF carrying all overlay control and data tunnel traffic, and often in-band management. + * **VPN 512** is used for isolated *out-of-band* management, conceptually similar to a traditional FVRF. + +### Why Use Front-Door VRF? + +1. **Security:** Prevents unauthorized access to management interfaces via data-plane attacks. +2. **Isolation:** Ensures management traffic doesn’t mix with production traffic. +3. **Multi-Tenancy:** Useful in service provider environments where management traffic must be segregated per customer. +4. **Simplified Routing:** Avoids route conflicts between management and data networks. + +### How FVRF Works + + * The **management interface (e.g., Mgmt0/0)** is assigned to a dedicated VRF (e.g., `MGMT-VRF`). + * All management traffic (SSH, SNMP, etc.) must go through this VRF. + * The data plane (regular traffic) uses the **default global routing table** or other service VRFs. + +### Configuration Example (Cisco IOS-XE / IOS) + +#### 1\. Create the Management VRF + +```bash +configure terminal +vrf definition MGMT-VRF + rd 100:1 ! Route Distinguisher (for uniqueness) + address-family ipv4 + exit-address-family +exit +``` + +#### 2\. Assign the Management Interface to the VRF + +```bash +interface GigabitEthernet0/0 + description Management Interface + vrf forwarding MGMT-VRF + ip address 192.168.1.1 255.255.255.0 + no shutdown +exit +``` + +#### 3\. Configure a Default Route for Management Traffic + +```bash +ip route vrf MGMT-VRF 0.0.0.0 0.0.0.0 192.168.1.254 +``` + +*(Where `192.168.1.254` is the gateway for management traffic.)* + +#### 4\. Enable VRF-Aware Services + +```bash +ip http server +ip http vrf MGMT-VRF ! Ensures HTTP/HTTPS uses the MGMT-VRF +line vty 0 4 + transport input ssh vrf-alias MGMT-VRF enable ! Ensures SSH uses the MGMT-VRF +exit +``` + +### Verification + + * Check VRF routing table: + ```bash + show ip route vrf MGMT-VRF + ``` + * Verify interface assignment: + ```bash + show vrf brief + ``` + * Test connectivity: + ```bash + ping vrf MGMT-VRF 192.168.1.254 + ``` + +### Key Considerations + + * **NTP & DNS:** If management relies on NTP/DNS, ensure they are reachable via the FVRF. + * **Backup Access:** Always maintain an alternative access method (console) in case of misconfiguration. + * **Compatibility:** Some older Cisco devices may not support all VRF-aware services. + +### Conclusion + +Front-Door VRF is a best practice for securing management traffic in Cisco environments. By isolating management interfaces in a separate VRF, you reduce attack surfaces and prevent unauthorized access through data-plane vulnerabilities. + +----- \ No newline at end of file