Update tech_docs/networking/sdwan_primer.md

This commit is contained in:
2025-07-29 13:24:34 -05:00
parent 36938fff42
commit 48364f8970

View File

@@ -1,125 +1,883 @@
Heres the **20% of SD-WAN that covers 80% of what you need to know**, with a focus on **practical knowledge for senior network roles** and seamless integration with your IPSec expertise:
Your collection of documents on SD-WAN is exceptionally thorough and well-structured. I've compiled them into a single, comprehensive draft, incorporating the refinements and addressing the inconsistencies we've discussed, particularly around VPN 0, VPN 512, and the Front-Door VRF concept.
---
I've aimed to create a cohesive flow, starting with the "Top 1% Mindset" to set the stage, moving into the crash course for foundational understanding, then diving deep into TLOCs and the three planes, and finally, detailing key configurations and troubleshooting.
Here is the complete draft:
-----
# Mastering SD-WAN: From Fundamentals to the Top 1% Mindset
## The Top 1% Mindset
You dont just deploy SD-WAN—you **orchestrate** it.
You think in **abstractions** (colors, TLOCs, VPNs) not hardware.
You troubleshoot like a surgeon—control plane first, then data plane, then app logic.
**Example:**
Problem: VoIP calls drop but O365 works.
**Top 1% Debug:**
* Check BFD (control plane).
* Verify TLOC preferences (is LTE taking over incorrectly?).
* Inspect `app-route` policy (is VoIP pinned to MPLS but PfR overriding?).
* Drill into `show app-aware stats` (is jitter spiking on broadband?).
**Final Thought**
Most SD-WAN "engineers" just click through vManage. The **real pros** know:
* Transport independence isnt automatic—its designed.
* Policies arent rules—theyre a logic flow.
* Troubleshooting isnt guessing—its methodical dissection.
Youre asking the right questions. Now go break (then fix) some TLOCs. 🚀
(And yes, we both know Ciscos docs dont explain this stuff clearly—thats why the top 1% reverse-engineer it.)
-----
## SD-WAN Crash Course: The 20% That Matters
### **SD-WAN Crash Course: The 20% That Matters**
**Goal:** Understand **core SD-WAN concepts**, how they differ from traditional WAN, and how they integrate with IPSec.
---
### 1\. SD-WAN vs Traditional WAN
## **1. SD-WAN vs Traditional WAN**
| **Feature** | **Traditional WAN (MPLS/VPN)** | **SD-WAN** |
|----------------------|-------------------------------|------------|
| **Cost** | Expensive (MPLS circuits) | Cheaper (uses Internet + broadband) |
| **Agility** | Manual config changes | Centralized, automated policies |
| **Performance** | Predictable but rigid | Dynamic path selection (jitter/loss-aware) |
| **Security** | Relies on IPSec/MPLS | Built-in encryption (IPSec, TLS) |
| **Topology** | Hub-and-spoke | Any-to-any, mesh |
| **Feature** | **Traditional WAN (MPLS/VPN)** | **SD-WAN** |
| :---------- | :----------------------------- | :--------- |
| **Cost** | Expensive (MPLS circuits) | Cheaper (uses Internet + broadband) |
| **Agility** | Manual config changes | Centralized, automated policies |
| **Performance** | Predictable but rigid | Dynamic path selection (jitter/loss-aware) |
| **Security** | Relies on IPSec/MPLS | Built-in encryption (IPSec, TLS) |
| **Topology** | Hub-and-spoke | Any-to-any, mesh |
**Key Takeaway:**
- SD-WAN **decouples control plane from hardware**, allowing dynamic traffic routing over **any transport (MPLS, LTE, broadband)**.
**Key Takeaway:**
---
* SD-WAN **decouples control plane from hardware**, allowing dynamic traffic routing over **any transport (MPLS, LTE, broadband)**.
## **2. SD-WAN Core Components**
### **(1) Edge Devices (CPE)**
- **e.g., Cisco vEdge, FortiGate, VeloCloud**
- Sit at branch offices, apply policies, and encrypt traffic.
### 2\. SD-WAN Core Components
### **(2) Orchestrator (Controller)**
- **e.g., Cisco vManage, VMware Orchestrator**
- **Centralized policy management** (no CLI needed!).
**(1) Edge Devices (CPE)**
### **(3) Overlay Tunnels**
- **Encrypted tunnels** (IPSec, GRE, DTLS) between edges.
- Uses **TLOC (Transport Locator)** = Public IP + Color (e.g., `INET`, `MPLS`).
* e.g., Cisco vEdge, FortiGate, VeloCloud
* Sit at branch offices, apply policies, and encrypt traffic.
### **(4) Underlay Transport**
- **Any WAN link**: MPLS, Internet, LTE, 5G.
**(2) Orchestrator (Controller)**
---
* e.g., Cisco vManage, VMware Orchestrator
* **Centralized policy management** (no CLI needed\!).
## **3. How SD-WAN Works (The 80% You Need)**
### **(1) Path Selection**
- **Dynamic multi-path steering**: Chooses best path based on:
- **Application SLA** (e.g., VoIP → low latency).
- **Real-time metrics** (jitter, packet loss, latency).
**(3) Overlay Tunnels**
* **Encrypted tunnels** (IPSec, GRE, DTLS) between edges.
* Uses **TLOC (Transport Locator)** = Public IP + Color (e.g., `INET`, `MPLS`).
**(4) Underlay Transport**
* **Any WAN link**: MPLS, Internet, LTE, 5G.
### 3\. How SD-WAN Works (The 80% You Need)
**(1) Path Selection**
* **Dynamic multi-path steering**: Chooses best path based on:
* **Application SLA** (e.g., VoIP → low latency).
* **Real-time metrics** (jitter, packet loss, latency).
**Example Policy:**
**Example Policy:**
```plaintext
IF (Application == VoIP) AND (Latency > 50ms) → SWITCH to backup link
IF (Application == VoIP) AND (Latency > 50ms) → SWITCH to backup link
```
### **(2) Zero-Touch Provisioning (ZTP)**
- Plug in a device → auto-configures via orchestrator.
**(2) Zero-Touch Provisioning (ZTP)**
### **(3) Application-Aware Routing**
- **DPI (Deep Packet Inspection)** identifies apps (e.g., Teams, SAP).
- **QoS prioritization** (VoIP > YouTube).
* Plug in a device → auto-configures via orchestrator.
### **(4) Security Integration**
- **IPSec for all overlays** (mandatory for Internet links).
- **Cloud-based firewalls** (e.g., FortiGate, Zscaler).
**(3) Application-Aware Routing**
---
* **DPI (Deep Packet Inspection)** identifies apps (e.g., Teams, SAP).
* *(Note: While effective, some advanced encryption like TLS 1.3 can limit DPI's visibility, requiring IP-based fallbacks.)*
* **QoS prioritization** (VoIP \> YouTube).
## **4. SD-WAN + IPSec Integration**
- **SD-WAN uses IPSec for secure tunnels** but adds:
- **Automated key rotation** (no manual PSK updates).
- **Tunnel bonding** (combines multiple links for throughput).
**(4) Security Integration**
**Key Difference:**
- Traditional IPSec VPN = **static tunnels**.
- SD-WAN IPSec = **dynamic, SLA-driven tunnels**.
* **IPSec for all overlays** (mandatory for Internet links).
* **Cloud-based firewalls** (e.g., FortiGate, Zscaler).
---
### 4\. SD-WAN + IPSec Integration
## **5. SD-WAN Troubleshooting (Top 5 Issues)**
| **Issue** | **Debug Command** | **Fix** |
|-------------------------------|--------------------------------------|---------|
| **Tunnels not coming up** | `show sdwan tunnel` (Cisco) | Check underlay reachability |
| **Poor VoIP quality** | `show sdwan app-route stats` | Adjust SLA thresholds |
| **Orchestrator sync failure** | `show sdwan control connections` | Verify certs/connectivity |
| **Traffic taking wrong path** | `show sdwan policy-service-path` | Fix application-aware rules |
| **High latency on backup** | `show sdwan interface` | Enable FEC (Forward Error Correction) |
* **SD-WAN uses IPSec for secure tunnels** but adds:
* **Automated key rotation** (no manual PSK updates).
* **Tunnel bonding** (combines multiple links for throughput).
---
**Key Difference:**
## **6. SD-WAN vs. DMVPN (Common Interview Qs)**
**Q: When would you use SD-WAN over DMVPN?**
- **SD-WAN**: When you need **application-aware routing + centralized management**.
- **DMVPN**: When you need **scalable IPSec tunnels but dont need SaaS optimization**.
* Traditional IPSec VPN = **static tunnels**.
* SD-WAN IPSec = **dynamic, SLA-driven tunnels**.
**Q: Can SD-WAN replace IPSec?**
- **No!** SD-WAN **uses** IPSec for encryption but adds intelligence on top.
### 5\. SD-WAN Troubleshooting (Top 5 Issues)
---
| **Issue** | **Debug Command** | **Fix** |
| :-------- | :---------------- | :------ |
| **Tunnels not coming up** | `show sdwan tunnel` (Cisco) | Check underlay reachability |
| **Poor VoIP quality** | `show sdwan app-route stats` | Adjust SLA thresholds |
| **Orchestrator sync failure** | `show sdwan control connections` | Verify certs/connectivity |
| **Traffic taking wrong path** | `show sdwan policy-service-path` | Fix application-aware rules |
| **High latency on backup** | `show sdwan interface` | Enable FEC (Forward Error Correction) |
## **7. Lab Practice (Quick Wins)**
1. **Simulate link failure** in GNS3/EVE-NG → Watch SD-WAN switch paths.
2. **Prioritize VoIP traffic** over YouTube.
3. **Break the orchestrator** → Observe fallback to local policies.
### 6\. SD-WAN vs. DMVPN (Common Interview Qs)
**Q: When would you use SD-WAN over DMVPN?**
* **SD-WAN**: When you need **application-aware routing + centralized management**.
* **DMVPN**: When you need **scalable IPSec tunnels but dont need SaaS optimization**.
**Q: Can SD-WAN replace IPSec?**
* **No\!** SD-WAN **uses** IPSec for encryption but adds intelligence on top.
### 7\. Lab Practice (Quick Wins)
1. **Simulate link failure** in GNS3/EVE-NG → Watch SD-WAN switch paths.
2. **Prioritize VoIP traffic** over YouTube.
3. **Break the orchestrator** → Observe fallback to local policies.
**CLI Examples (Cisco Viptela):**
**CLI Examples (Cisco Viptela):**
```bash
show sdwan control connections # Check orchestrator status
show sdwan app-route stats # Verify path selection
clear sdwan tunnel # Force tunnel re-establishment
show sdwan control connections # Check orchestrator status
show sdwan app-route stats # Verify path selection
clear sdwan tunnel # Force tunnel re-establishment
```
---
### 8\. Interview Cheat Sheet
## **8. Interview Cheat Sheet**
**SD-WAN = Automation + Application-Aware Routing + Multiple Underlays**.
**IPSec is still used, but dynamically managed**.
**Key metrics: Jitter (<30ms), Latency (<150ms), Packet Loss (<1%)**.
**Orchestrator is the brain; edges are the muscle**.
***SD-WAN = Automation + Application-Aware Routing + Multiple Underlays**.
***IPSec is still used, but dynamically managed**.
***Key metrics: Jitter (\<30ms), Latency (\<150ms), Packet Loss (\<1%)**.
***Orchestrator is the brain; edges are the muscle**.
---
-----
### **Where to Go Next?**
1. **Deep dive into your vendors SD-WAN** (Cisco, Fortinet, VMware).
2. **Learn cloud-integrated SD-WAN** (AWS Transit Gateway, Azure Virtual WAN).
3. **Study real-world designs** (e.g., "How SD-WAN replaces MPLS").
## The Three Planes of SD-WAN & Modern Networking
Need a **deep dive on a specific SD-WAN vendor** or **mock scenarios**? Let me know! 🚀
In modern networking, especially with overlay technologies like SD-WAN, we deal with **three distinct planes**, each serving a critical role.
### 1\. Management Plane
* **Purpose:** Controls **device access and monitoring** (SSH, SNMP, HTTPS, syslog, etc.). It's about how you *interact with* the device.
* **Key Components:**
| Component | Protocol | Port | Description |
| :-------- | :------- | :--- | :---------- |
| **vManage** | HTTPS (WebUI) | TCP/443 | GUI/API for centralized control and configuration. |
| **vBond** | DTLS | UDP/23456 | Orchestrator for device authentication and initial redirection to vManage. |
| **Zero-Touch Provisioning (ZTP)** | DHCP/HTTPS | - | Auto-configures devices out-of-the-box. |
* **Traffic Flow:**
1. **Onboarding:** Device contacts **vBond** (DTLS) → gets redirected to **vManage**. Downloads config/CSR via **HTTPS**.
2. **Ongoing Management:** Devices send **telemetry** (metrics, logs) to vManage. Policies (security, routing) are pushed **from vManage**.
* **Security Considerations:**
***Always use isolated VRFs for management traffic** (e.g., traditional FVRF, or VPN 512 in SD-WAN for OOB management).
***Mutual TLS (mTLS)** for device-vManage communication.
***Role-Based Access Control (RBAC)** in vManage.
### 2\. Control Plane
* **Purpose:** Handles **protocols that build network intelligence** (BGP, OSPF, VXLAN EVPN, SD-WAN OMP, STP, LACP, etc.). It's about how the network *learns* its topology and reachability.
* **Key Protocols (SD-WAN Specific):**
| Protocol | Function | Port |
| :------- | :------- | :--- |
| **OMP (Overlay Management Protocol)** | Advertises routes, TLOCs, policies. | DTLS/UDP/40322 |
| **BGP (optional)** | Legacy WAN integration or underlay routing. | TCP/179 |
| **TLOC (Transport Locator)** | Maps physical WAN links to logical tunnels for policy application. | - |
* **How OMP Works:**
1. **vSmart controllers** act as **route reflectors** for OMP.
2. **Edge devices (vEdges)** send:
* **Routes** (prefixes learned from LAN/WAN).
* **TLOCs** (tunnel endpoints, e.g., `public-IP:color`).
* **Policies** (e.g., "prefer MPLS for VoIP").
3. **vSmart redistributes** this info to all edges.
* **Example OMP Route Advertisement:**
```bash
vEdge# show omp routes
RECEIVED ROUTES:
Prefix TLOC IP Color Preference
10.1.1.0/24 203.0.113.1 mpls 100
10.1.1.0/24 198.51.100.1 biz-internet 50
```
*(MPLS is preferred over Internet due to higher preference.)*
* **Key Traits:**
* **Distributes reachability info** (routes, tunnels, topology).
* Runs on the **CPU (software-based)** and is vulnerable to floods (e.g., BGP attacks).
* Can be placed in a **separate VRF** (but not a traditional FVRF which is management-only).
* **Security Considerations:**
* ✔ **DTLS encryption** for OMP (no cleartext control traffic\!).
* ✔ **Control-plane policing (CoPP)** to prevent floods.
* ✔ **Private WAN links (MPLS)** for critical control traffic.
### 3\. Data Plane (Forwarding Plane)
* **Purpose:** **Moves user traffic** (packets/frames) at **line rate (hardware-accelerated)**. It's about *moving* the actual data.
* **Key Technologies:**
| Technology | Role |
| :--------- | :--- |
| **IPsec/GRE** | Encrypted tunnels between edges. |
| **TLOC (Transport Locator)** | Logical tunnel endpoint (e.g., `public-IP:color`). |
| **Application-Aware Routing (AAR)** | Dynamically switches paths based on SLA. |
* **Data Flow Example:**
1. **Traffic arrives at vEdge:**
* Classified via **DPI (Deep Packet Inspection)**.
* Tagged with **QoS markings (DSCP)**.
2. **Path Selection:**
* Checks **OMP-learned TLOCs** and **SLA metrics**.
* Chooses best path (e.g., MPLS for VoIP, Internet for web).
3. **Encapsulation:**
* Wrapped in **IPsec (ESP/AH)** or **GRE**.
* Sent to peer vEdge via **WAN (MPLS/Internet/5G)**.
* **Packet Walkthrough (Simplified):**
1. **Original Packet:**
```
SRC: 10.1.1.100 (LAN) | DST: 8.8.8.8 (Internet)
```
2. **After SD-WAN Processing:**
```
[IPsec][GRE][SD-WAN Header][Original Packet]
SRC: 203.0.113.1 (vEdge Public IP)
DST: 198.51.100.2 (Peer vEdge Public IP)
```
* **Key Traits:**
* **ASIC/switch-chip driven** (not CPU).
* **Doesnt care about routes/tunnels**—just forwards based on FIB/TCAM.
* **Security Considerations:**
* ✔ **IPsec (AES-256-GCM, IKEv2)** for all tunnels.
* ✔ **Zone-Based Firewall** on vEdges.
* ✔ **SLA-based DDoS protection** (drop jitter/lossy links).
### Why This Separation Matters
| Plane | Runs On | Isolation Needed? | Risks if Compromised |
| :---- | :------ | :---------------- | :------------------- |
| **Management** | CPU | **Yes (Dedicated VRF/OOB)** | Total device takeover |
| **Control** | CPU | **Yes (VRF/CoPP)** | Network meltdown (BGP hijacks, loops) |
| **Data** | ASIC | **No (but ACLs help)** | Performance drops (DDoS), but no config access |
### Common Misconceptions
1. **"Control Plane = Management Plane"** → **No\!**
* **Control Plane:** BGP, OSPF, VXLAN EVPN.
* **Management Plane:** SSH, SNMP.
* *(Theyre both CPU-based but serve different purposes.)*
2. **"A traditional FVRF can carry BGP/VXLAN"** → **No\!**
* Traditional FVRF (Front-Door VRF) is **only for management** traffic, isolated from data/control.
* BGP/VXLAN go in **normal VRFs** or a dedicated control-plane VRF.
3. **"Data Plane Needs a VRF"** → **Usually No.**
* Data traffic follows the **FIB** (built by the control plane).
* VRFs for data are typically for **tenant isolation** (e.g., MPLS VPNs, multi-tenancy service VPNs in SD-WAN).
### Real-World Use Cases
1. **SD-WAN**
* **Management:** vManage (HTTPS).
* **Control:** OMP (Overlay Management Protocol).
* **Data:** Encrypted tunnels (IPsec/GRE).
2. **VXLAN EVPN**
* **Management:** SSH to switches.
* **Control:** BGP EVPN (MAC/IP routing).
* **Data:** VXLAN-encapsulated traffic.
3. **Service Provider MPLS**
* **Management:** TACACS+ for routers.
* **Control:** LDP/RSVP (label distribution).
* **Data:** Label-switched packets.
### Key Takeaways
1. **Management Plane** = Your **remote admin access** (dedicated VRF/OOB).
2. **Control Plane** = **Protocols that build the network** (BGP, EVPN, OSPF, OMP).
3. **Data Plane** = **Raw packet forwarding** (ASIC-driven, no intelligence).
### Final Thought
The industrys failure to **physically separate all three planes** (like servers do with iLO) is a security flaw. But until vendors fix it:
* **Isolate management traffic in dedicated VRFs (like a traditional FVRF or SD-WAN's VPN 512 for OOB).**
* **Use VRFs/CoPP for control-plane isolation and protection.**
* **Trust ASICs for the data plane.**
-----
## Deep Dive: TLOCs (Transport Locators) The Spine of SD-WAN
TLOCs are the **make-or-break abstraction** in SD-WAN architectures (especially Cisco Viptela). Theyre the glue between the underlay (physical links) and overlay (logical policies). But most engineers only *think* they understand them. Lets fix that.
### 1\. TLOCs: The Core Concept
A **TLOC** is a **logical representation** of a WAN edge routers transport connection. Its defined by three key attributes:
* **TLOC IP** (the physical interface IP).
* **Color** (e.g., `mpls`, `biz-internet`, `lte`).
* **Encapsulation** (IPsec or TLS).
**Why this matters:**
* TLOCs **decouple policies from hardware**. You can swap circuits (e.g., change ISP) without rewriting all your rules.
* They enable **transport-independent routing**—policies reference colors, not IPs.
### 2\. TLOC Components Whats Under the Hood
#### A. TLOC Extended Attributes
These are **hidden knobs** that influence path selection:
* **Preference** (like admin distance higher = better).
* **Weight** (for load-balancing across equal paths).
* **Public/Private IP** (for NAT traversal).
* **Site-ID** (prevents misrouting in multi-tenant setups).
**Example:**
```plaintext
tloc-extension {
ip = 203.0.113.1
color = biz-internet
encap = ipsec
preference = 100 # Higher = more preferred
}
```
#### B. TLOC Groups
* **Primary/Backup Groups:** Force deterministic failover (e.g., "Use LTE only if MPLS is down").
* **Geographic Groups:** Steer traffic regionally (e.g., "EU branches prefer EU-based TLOCs").
**Pro Tip:** Misconfigured groups cause **asymmetric routing**—always validate with `show sdwan tloc`.
### 3\. TLOC Lifecycle How Theyre Born, Live, and Die
#### A. TLOC Formation
* **Discovery:** Router advertises its TLOCs via OMP (Overlay Management Protocol).
* **Validation:** BFD (Bidirectional Forwarding Detection) confirms reachability.
* **Installation:** TLOC enters the RIB (Routing Information Base) if valid.
**Critical Check:**
* `show sdwan omp tlocs` \# Verify TLOC advertisements
* `show sdwan bfd sessions` \# Confirm liveliness
#### B. TLOC States
* **Up/Active:** BFD is healthy, traffic can flow.
* **Down/Dead:** BFD failed, TLOC is pulled from RIB.
* **Partial:** One direction works (asymmetric routing risk\!).
**Debugging:**
* `show sdwan tloc | include Partial` \# Hunt for flapping TLOCs
### 4\. TLOC Policies The Real Power
#### A. Influencing Path Selection
* **Route Policy:** Modify TLOC preferences per-application.
```plaintext
apply-policy {
app-route voip {
tloc = mpls preference 200 # Always prefer MPLS for VoIP
}}
```
* **Smart TLOC Preemption:** Fail back aggressively (or not).
#### B. TLOC Affinity
* **Sticky TLOCs:** Pin flows to a TLOC (e.g., for SIP trunks).
* **Load-Balancing:** Distribute across TLOCs with equal weight.
**Gotcha:** Affinity conflicts with **Performance Routing (PfR)**—tune carefully\!
### 5\. TLOC Troubleshooting The Dark Arts
#### A. Common TLOC Failures
* **BFD Flapping** → TLOCs bounce.
* Fix: Adjust BFD timers (`bfd-timer 300 900 3`). (Hello interval 300ms, Multiplier 3)
* **Color Mismatch** → TLOCs dont form.
* Fix: Ensure colors match exactly (case-sensitive\!).
* **NAT Issues** → Private IP leaks.
* Fix: Use `tloc-extension public-ip`.
#### B. Advanced Debugging
* `debug sdwan omp tlocs` \# Watch TLOC advertisements in real-time
* `debug sdwan bfd events` \# Catch BFD failures
* `show sdwan tloc-history` \# Track TLOC changes over time
### 6\. TLOC vs. The World
| Concept | TLOC | Traditional WAN Addressing |
| :------ | :--- | :------------------------- |
| **Addressing** | Logical (color-based) | Physical (IP-based) |
| **Failover** | Sub-second (BFD + OMP) | Slow (BGP convergence) |
| **Policies** | Transport-agnostic | Hardcoded to interfaces |
**Key Takeaway:** TLOCs turn **network plumbing** into **policy-driven intent**.
**Final Word**
Mastering TLOCs means:
* ✅ You **never** blame "the SD-WAN" for routing issues—you dissect TLOC states.
* ✅ You **design for intent** (colors, groups) instead of hacking interface configs.
* ✅ You **troubleshoot like a surgeon**—OMP → BFD → TLOC → Policy.
Now go forth and make TLOCs obey. 🚀
(And when Cisco TAC says "its a TLOC issue," youll know exactly where to look.)
-----
## SD-WAN Site ID + Color + Management Subnet Integration Guide
To build a **scalable, intuitive, and operationally efficient** SD-WAN fabric, well combine:
1. **Site IDs** (Logical location identifiers)
2. **Colors** (Underlay transport identification)
3. **Management Subnet** (VRF for OOB/In-band management)
Heres how to plan and implement them cohesively:
### 1\. Hierarchy & Assignment Strategy
#### A. Site ID + Color + Management Subnet Relationship
| Component | Purpose | Example Value | Design Tip |
| :-------- | :------ | :------------ | :--------- |
| **Site ID** | Uniquely identifies a branch/DC | `100` (HQ), `200` (Branch) | Use geographic encoding (e.g., `1` = Americas). |
| **Color** | Identifies WAN transport types | `mpls`, `internet`, `lte` | Match colors to ISP/underlay (e.g., `verizon_mpls`). |
| **Mgmt Subnet** | Dedicated subnet for OOB/In-band mgmt | `10.255.100.0/24` (VPN 0 or VPN 512) | Isolate from data VPNs (1-511). |
#### B. Structured Numbering Example
**Scenario**: A multinational with:
* **Region 1 (Americas)**: MPLS + Internet
* **Region 2 (EMEA)**: MPLS + LTE
| Site | Site ID | System IP | Colors (Transport) | Management Subnet |
| :--- | :------ | :-------- | :----------------- | :---------------- |
| **HQ (Dallas)** | `100` | `172.16.100.1` | `mpls_blue`, `biz_internet` | `10.255.100.0/24` (VPN 0) |
| **Branch (NY)** | `101` | `172.16.101.1` | `mpls_blue`, `biz_internet` | `10.255.101.0/24` (VPN 0) |
| **DC (Frankfurt)** | `200` | `172.16.200.1` | `europe_mpls`, `lte_backup` | `10.255.200.0/24` (VPN 0) |
### 2\. Color Planning Best Practices
#### A. Standardize Color Naming
* Use **descriptive, consistent names**:
```plaintext
<carrier>_<type> (e.g., `att_mpls`, `comcast_biz_internet`)
```
* Avoid generic names like `primary`, `secondary` (confusing at scale).
#### B. Color Redundancy Rules
* Assign **at least 2 colors per site** (e.g., `mpls` + `internet`).
* Use **BFD** for fast failover between colors.
#### C. Color Mapping to TLOCs
* Each **color** corresponds to a **TLOC** (Transport Locator).
* Example TLOC config:
```bash
vEdge(config)# vpn 0 interface ge0/0
tunnel-interface
color mpls restrict # Restrict to MPLS underlay
```
### 3\. Management Subnet Strategy
#### A. Key Requirements
* **Isolation**: Management traffic should be isolated.
* **In-band Management:** Typically resides in **VPN 0** (shares the transport VRF with control/data overlay traffic but is logically separate).
* **Out-of-Band (OOB) Management:** For dedicated management ports (e.g., `GigabitEthernet0/0` on a vEdge), use **VPN 512**. Routes in VPN 512 are **NOT** advertised into the OMP overlay.
* **Subnet Size**: `/24` recommended (supports up to 254 devices).
#### B. Addressing Scheme Example
For **In-band Management (VPN 0)**:
```plaintext
10.255.<Site ID>.0/24
Example:
- Site ID 100 → `10.255.100.0/24`
- Site ID 200 → `10.255.200.0/24`
```
For **Out-of-Band Management (VPN 512)**: Use a completely separate, non-overlapping management subnet, typically on a dedicated physical interface.
**Benefits**:
* Predictable IPs (easy troubleshooting).
* No overlaps with service VPNs.
#### C. vManage Integration
* Define management subnets in **vManage Templates**:
```bash
device vpn 0
interface eth0
ip address 10.255.100.1/24
tunnel-interface
color biz_internet restrict
```
(For VPN 512, you'd configure a separate interface under `device vpn 512`).
### 4\. Putting It All Together: Design Checklist
1. **Site IDs**: Geographic/role-based, unique, documented in IPAM.
2. **Colors**: Named after carriers, assigned to TLOCs, redundant.
3. **Management Subnet**:
* `/24` in VPN 0 for in-band.
* `/24` in VPN 512 for OOB (preferred for dedicated management ports).
4. **System IPs**: Align with Site ID (e.g., Site ID `100` → `172.16.100.1`).
### 5\. Common Pitfalls
❌ **Color Conflicts**: Reusing `mpls` for different ISPs (use `att_mpls`, `verizon_mpls`).
❌ **Mgmt Overlaps**: Sharing `10.255.100.0/24` across sites (always subnet per site).
❌ **Unstructured Site IDs**: Random numbers (hard to scale beyond 50 sites).
❌ **Incorrect VPN for Internet Breakout**: Using VPN 512 for DIA (it's for OOB management). DIA should be in a service VPN or VPN 0.
### Final Topology Example
```plaintext
Site ID: 100 (Dallas HQ)
- System IP: 172.16.100.1
- Colors: mpls_blue, biz_internet
- Mgmt Subnet: 10.255.100.0/24 (VPN 0 for in-band)
- Service VPNs: 10 (LAN), 20 (VoIP)
```
-----
## SD-WAN Fabric Bring-Up Essentials
To **bring up an SD-WAN fabric**, you need to configure key components correctly. Below is a **concise, step-by-step breakdown** of the essentials, along with **critical design considerations**.
### 1\. Underlay Network (VPN 0 - Transport VRF / Front-Door VRF)
* **Purpose**: Handles **control-plane traffic** (OMP, DTLS/TLS tunnels between devices) and **encapsulated data-plane traffic**. All physical WAN interfaces that connect to the underlay belong to VPN 0.
* **Key Configurations**:
* **Interfaces**: Assign WAN interfaces (e.g., MPLS, Internet, LTE) to VPN 0.
* **Routing**:
* Static routes (for simple setups).
* BGP/OSPF (for dynamic underlay routing in larger deployments).
* **TLOC Extensions**: Define public/private IPs for tunnel endpoints, along with colors.
* **Design Considerations**:
* **Dual Underlay**: Use at least **two transport types** (e.g., MPLS + Internet) for redundancy.
* **TLOC Preference**: Prioritize cheaper/faster links (e.g., MPLS over LTE).
### 2\. Overlay Network (OMP Routing)
* **Purpose**: Distributes routes and policies across the fabric.
* **Key Configurations**:
* **OMP (Overlay Management Protocol)**: Advertises routes, TLOCs, and policies between vSmart controllers and edges.
* **Route Policies**: Control which prefixes are shared (e.g., only corporate LAN routes).
* **Design Considerations**:
* **Route Aggregation**: Minimize prefixes advertised to vSmart (e.g., summarize branch LANs).
* **TLOC Redundancy**: Assign multiple TLOCs per route for failover.
### 3\. Service VPNs (VPN 1-511)
* **Purpose**: Segments user/data traffic (e.g., corporate LAN, guest Wi-Fi, VoIP).
* **Key Configurations**:
* **VRF Creation**: Define VPNs (e.g., `vpn 10` for corporate LAN).
* **Interface Assignment**: Assign LAN interfaces to the correct VPN.
* **Route Leaking**: If needed, allow controlled traffic flow between VPNs (via centralized policies).
* **Design Considerations**:
* **QoS Tagging**: Apply DSCP markings per VPN (e.g., EF for VoIP in `vpn 20`).
* **Security Policies**: Restrict inter-VPN communication (e.g., guest Wi-Fi in `vpn 30` cant reach `vpn 10`).
### 4\. Internet Breakout
* **Purpose**: Local internet access (DIA) from branches or centralized internet access from a datacenter.
* **Key Configurations**:
* **NAT & Firewall**: Enable NAT overload (PAT) for private→public IP translation on the egress interface.
* **Policy-Based Routing (PBR) or Application-Aware Routing**: Steer specific traffic (e.g., SaaS apps, guest Wi-Fi) to the local internet path.
* **Design Considerations**:
* **Security**: Apply **ZTNA/Umbrella** or other security services for secure internet access.
* **Backup Path**: If local DIA fails, fall back to centralized internet via the overlay.
* **Note**: This is typically configured in a **service VPN (e.g., VPN 10, or a dedicated internet VPN like VPN 999)**, or by routing traffic directly out a VPN 0 interface with specific policies and NAT. **VPN 512 is reserved for Out-of-Band Management, not Internet Breakout.**
### 5\. Management & Control Plane Connectivity
* **Purpose**: Ensures vEdges can securely connect to controllers (vManage, vSmart, vBond).
* **Key Configurations**:
* **Controller IPs**: Ensure vEdges can reach vManage/vSmart/vBond over VPN 0.
* **Certificate Auth**: Use device certificates for secure onboarding.
* **Design Considerations**:
* **Out-of-Band (OOB) Management (VPN 512)**: Use a separate OOB network with interfaces in VPN 512 for high availability and isolation of management traffic from the overlay.
* **Geo-Redundancy**: Deploy controllers in multiple regions.
### 6\. Security Policies
* **Purpose**: Enforce traffic rules (e.g., blocking, inspection).
* **Key Configurations**:
* **Zone-Based Firewall**: Assign interfaces to zones (e.g., "inside," "outside").
* **Application-Aware Policies**: Block high-risk apps (e.g., Tor, Netflix).
* **Design Considerations**:
* **Default-Deny**: Start with "deny all," then allow only needed traffic.
* **IPS/IDS**: Enable for internet-bound traffic.
### 7\. High Availability (HA)
* **Design Considerations**:
* **Dual vSmarts**: Avoid single points of failure for the control plane.
* **Active/Standby Edges**: Use VRRP/HSRP for LAN-side HA at critical sites.
* **Cloud Gateway Redundancy**: For cloud-onramp (e.g., AWS/Azure).
### Summary Checklist
| **Step** | **Action** | **Critical Design Tip** |
| :------- | :--------- | :---------------------- |
| **1. Underlay** | Configure VPN 0 interfaces & routing | Dual transports (MPLS + Internet) |
| **2. Overlay** | Set up OMP & route policies | Summarize routes to reduce overhead |
| **3. Service VPNs** | Define VPNs 1-511 & assign interfaces | Use QoS for VoIP/VC traffic |
| **4. Internet** | Configure DIA in a Service VPN or VPN 0 | Add ZTNA/umbrella for security |
| **5. Management** | Ensure controllers are reachable via VPN 0 | OOB management (VPN 512) for resiliency |
| **6. Security** | Apply firewall/IPS policies | Default-deny approach |
| **7. HA** | Deploy redundant controllers/edges | Active/standby for critical sites |
-----
## SD-WAN Application-Aware Routing (AAR) with `match app-list`
*Control traffic flows based on applications using vManage policies.*
### 1\. What is `match app-list`?
* **Purpose:** Identifies specific applications (e.g., Zoom, Netflix, VoIP) to steer traffic via policies.
* **Use Cases:**
* Prioritize VoIP over MPLS.
* Block high-risk apps (e.g., Tor).
* Local internet breakout (DIA) for SaaS apps.
### 2\. How It Works
1. **Application Detection:**
* Uses **Deep Packet Inspection (DPI)** to identify apps (even if ports are encrypted).
* Predefined app lists in vManage (e.g., `VOICE-AND-VIDEO`, `BUSINESS-APPS`).
2. **Policy Matching:**
* Policies reference `app-list` to trigger actions (e.g., change path, apply QoS).
### 3\. Configuration Steps
#### 3.1 Define an App List in vManage
1. Navigate to: **Configuration \> Policies \> Custom Options \> App-Aware Routing**
2. Create a new app list:
```plaintext
Name: CORPORATE-APPS
Applications:
- Microsoft-365
- Webex-Teams
- Zoom-Cloud
```
#### 3.2 Create a Policy Using `match app-list`
**Example:** *"Route Microsoft-365 traffic via VPN 10 (local internet breakout)"*
*(Note: VPN 512 is for Out-of-Band Management, not Internet Breakout. Use a service VPN like VPN 10 or route out VPN 0 for DIA.)*
```bash
policy-rule MICROSOFT-365-DIA
match app-list CORPORATE-APPS # Match predefined apps
action accept
set vpn 10 # Force local internet breakout via VPN 10
set dscp 46 # Mark for QoS (EF)
```
#### 3.3 Apply Policy to Sites
1. Attach policy to a **Centralized Policy** in vManage.
2. Push to target sites.
### 4\. Best Practices
#### 4.1 App List Design
* **Group logically:**
* `VOICE-AND-VIDEO`: Zoom, Webex, MS-Teams.
* `BUSINESS-CRITICAL`: SAP, Oracle, Salesforce.
* **Avoid overly broad lists** (e.g., "ALL-WEB") to prevent unintended matches.
#### 4.2 Policy Ordering
* **Higher priority** (lower number) policies evaluate first.
```bash
policy-list AAR-POLICY
sequence 10
match app-list VOICE-AND-VIDEO
action accept
set color mpls # Force MPLS for voice
sequence 20
match app-list NETFLIX
action drop # Block Netflix
```
#### 4.3 SLA-Based Fallback
* Combine with **Performance Routing (PfR)** to switch paths if SLA fails:
```bash
match app-list WEBEX
action accept
set sla preferred-color mpls latency 100ms
```
### 5\. Verification & Troubleshooting
#### 5.1 Key Commands
| Command | Purpose |
| :------ | :------ |
| `show sdwan app-aware stats` | Lists detected apps and paths. |
| `show sdwan policy service-statistics` | Checks policy hits. |
| `show sdwan app-fwd dpi flows` | Inspects DPI-classified flows. |
#### 5.2 Common Issues
| Symptom | Likely Cause | Fix |
| :------ | :----------- | :-- |
| App traffic not matching | Incorrect app-list definition | Verify app names in vManage. |
| Policy not applying | Wrong policy priority | Reorder policies (lower sequence = higher priority). |
| DPI not detecting apps | Encryption (TLS 1.3) | Use IP-based matching as fallback. |
### 6\. Advanced Use Cases
#### 6.1 Custom DPI Signatures
* For proprietary apps, add custom signatures:
```bash
app-list CUSTOM-APP
signature TCP port 5000 protocol HTTP user-agent "MyApp*"
```
#### 6.2 Combining with QoS
* Mark apps for prioritization:
```bash
match app-list VOICE
action accept
set dscp ef # Expedited Forwarding (VoIP)
```
#### 6.3 Internet Breakout for Specific Apps
```bash
match app-list SALESFORCE
action accept
set vpn 10 # Local breakout via VPN 10
set nat use-vpn 0 # Use VPN 0's NAT pool (if VPN 0 is internet-facing)
```
### 7\. Summary Checklist
* [ ] Define app lists in vManage (**Configuration \> Policies \> App-Aware Routing**).
* [ ] Use `match app-list` in policies to steer traffic.
* [ ] Test with `show sdwan app-aware stats`.
* [ ] Combine with SLA for dynamic failover.
### Key Takeaways
1. **`match app-list` enables application-aware routing** (not just IP/port-based).
2. **DPI visibility can be affected by strong encryption** (e.g., TLS 1.3 with ESNI) → May need fallback to IP-based matching.
3. **Policy order matters** — Highest priority (lowest sequence) evaluates first.
-----
## Front-Door VRF (FVRF) Explained (Using Cisco Gear)
**Front-Door VRF (FVRF)** is a Cisco feature that enhances security by separating the **management plane** from the **data plane** in network devices (routers, switches, firewalls). It achieves this by placing the management interface (SSH, SNMP, HTTPS, etc.) in a separate Virtual Routing and Forwarding (VRF) instance, isolating it from the default global routing table.
**Note:** While this document describes the general concept of Front-Door VRF in Cisco devices, in Cisco SD-WAN (Viptela-based) architectures:
* **VPN 0** is often referred to as the "Front-Door VRF" in the sense that it is the transport VRF carrying all overlay control and data tunnel traffic, and often in-band management.
* **VPN 512** is used for isolated *out-of-band* management, conceptually similar to a traditional FVRF.
### Why Use Front-Door VRF?
1. **Security:** Prevents unauthorized access to management interfaces via data-plane attacks.
2. **Isolation:** Ensures management traffic doesnt mix with production traffic.
3. **Multi-Tenancy:** Useful in service provider environments where management traffic must be segregated per customer.
4. **Simplified Routing:** Avoids route conflicts between management and data networks.
### How FVRF Works
* The **management interface (e.g., Mgmt0/0)** is assigned to a dedicated VRF (e.g., `MGMT-VRF`).
* All management traffic (SSH, SNMP, etc.) must go through this VRF.
* The data plane (regular traffic) uses the **default global routing table** or other service VRFs.
### Configuration Example (Cisco IOS-XE / IOS)
#### 1\. Create the Management VRF
```bash
configure terminal
vrf definition MGMT-VRF
rd 100:1 ! Route Distinguisher (for uniqueness)
address-family ipv4
exit-address-family
exit
```
#### 2\. Assign the Management Interface to the VRF
```bash
interface GigabitEthernet0/0
description Management Interface
vrf forwarding MGMT-VRF
ip address 192.168.1.1 255.255.255.0
no shutdown
exit
```
#### 3\. Configure a Default Route for Management Traffic
```bash
ip route vrf MGMT-VRF 0.0.0.0 0.0.0.0 192.168.1.254
```
*(Where `192.168.1.254` is the gateway for management traffic.)*
#### 4\. Enable VRF-Aware Services
```bash
ip http server
ip http vrf MGMT-VRF ! Ensures HTTP/HTTPS uses the MGMT-VRF
line vty 0 4
transport input ssh vrf-alias MGMT-VRF enable ! Ensures SSH uses the MGMT-VRF
exit
```
### Verification
* Check VRF routing table:
```bash
show ip route vrf MGMT-VRF
```
* Verify interface assignment:
```bash
show vrf brief
```
* Test connectivity:
```bash
ping vrf MGMT-VRF 192.168.1.254
```
### Key Considerations
* **NTP & DNS:** If management relies on NTP/DNS, ensure they are reachable via the FVRF.
* **Backup Access:** Always maintain an alternative access method (console) in case of misconfiguration.
* **Compatibility:** Some older Cisco devices may not support all VRF-aware services.
### Conclusion
Front-Door VRF is a best practice for securing management traffic in Cisco environments. By isolating management interfaces in a separate VRF, you reduce attack surfaces and prevent unauthorized access through data-plane vulnerabilities.
-----