Files

medusa 48364f8970 Update tech_docs/networking/sdwan_primer.md

2025-07-29 13:24:34 -05:00

35 KiB

Raw Blame History

Your collection of documents on SD-WAN is exceptionally thorough and well-structured. I've compiled them into a single, comprehensive draft, incorporating the refinements and addressing the inconsistencies we've discussed, particularly around VPN 0, VPN 512, and the Front-Door VRF concept.

I've aimed to create a cohesive flow, starting with the "Top 1% Mindset" to set the stage, moving into the crash course for foundational understanding, then diving deep into TLOCs and the three planes, and finally, detailing key configurations and troubleshooting.

Here is the complete draft:

Mastering SD-WAN: From Fundamentals to the Top 1% Mindset

The Top 1% Mindset

You don’t just deploy SD-WAN—you orchestrate it. You think in abstractions (colors, TLOCs, VPNs) not hardware. You troubleshoot like a surgeon—control plane first, then data plane, then app logic.

Example: Problem: VoIP calls drop but O365 works. Top 1% Debug:

Check BFD (control plane).
Verify TLOC preferences (is LTE taking over incorrectly?).
Inspect app-route policy (is VoIP pinned to MPLS but PfR overriding?).
Drill into show app-aware stats (is jitter spiking on broadband?).

Final Thought Most SD-WAN "engineers" just click through vManage. The real pros know:

Transport independence isn’t automatic—it’s designed.
Policies aren’t rules—they’re a logic flow.
Troubleshooting isn’t guessing—it’s methodical dissection.

You’re asking the right questions. Now go break (then fix) some TLOCs. 🚀 (And yes, we both know Cisco’s docs don’t explain this stuff clearly—that’s why the top 1% reverse-engineer it.)

SD-WAN Crash Course: The 20% That Matters

Goal: Understand core SD-WAN concepts, how they differ from traditional WAN, and how they integrate with IPSec.

1. SD-WAN vs Traditional WAN

Feature	Traditional WAN (MPLS/VPN)	SD-WAN
Cost	Expensive (MPLS circuits)	Cheaper (uses Internet + broadband)
Agility	Manual config changes	Centralized, automated policies
Performance	Predictable but rigid	Dynamic path selection (jitter/loss-aware)
Security	Relies on IPSec/MPLS	Built-in encryption (IPSec, TLS)
Topology	Hub-and-spoke	Any-to-any, mesh

Key Takeaway:

SD-WAN decouples control plane from hardware, allowing dynamic traffic routing over any transport (MPLS, LTE, broadband).

2. SD-WAN Core Components

(1) Edge Devices (CPE)

e.g., Cisco vEdge, FortiGate, VeloCloud
Sit at branch offices, apply policies, and encrypt traffic.

(2) Orchestrator (Controller)

e.g., Cisco vManage, VMware Orchestrator
Centralized policy management (no CLI needed!).

(3) Overlay Tunnels

Encrypted tunnels (IPSec, GRE, DTLS) between edges.
Uses TLOC (Transport Locator) = Public IP + Color (e.g., INET, MPLS).

(4) Underlay Transport

Any WAN link: MPLS, Internet, LTE, 5G.

3. How SD-WAN Works (The 80% You Need)

(1) Path Selection

Dynamic multi-path steering: Chooses best path based on:
- Application SLA (e.g., VoIP → low latency).
- Real-time metrics (jitter, packet loss, latency).

Example Policy:

IF (Application == VoIP) AND (Latency > 50ms) → SWITCH to backup link

(2) Zero-Touch Provisioning (ZTP)

Plug in a device → auto-configures via orchestrator.

(3) Application-Aware Routing

DPI (Deep Packet Inspection) identifies apps (e.g., Teams, SAP).
- (Note: While effective, some advanced encryption like TLS 1.3 can limit DPI's visibility, requiring IP-based fallbacks.)
QoS prioritization (VoIP > YouTube).

(4) Security Integration

IPSec for all overlays (mandatory for Internet links).
Cloud-based firewalls (e.g., FortiGate, Zscaler).

4. SD-WAN + IPSec Integration

SD-WAN uses IPSec for secure tunnels but adds:
- Automated key rotation (no manual PSK updates).
- Tunnel bonding (combines multiple links for throughput).

Key Difference:

Traditional IPSec VPN = static tunnels.
SD-WAN IPSec = dynamic, SLA-driven tunnels.

5. SD-WAN Troubleshooting (Top 5 Issues)

Issue	Debug Command	Fix
Tunnels not coming up	`show sdwan tunnel` (Cisco)	Check underlay reachability
Poor VoIP quality	`show sdwan app-route stats`	Adjust SLA thresholds
Orchestrator sync failure	`show sdwan control connections`	Verify certs/connectivity
Traffic taking wrong path	`show sdwan policy-service-path`	Fix application-aware rules
High latency on backup	`show sdwan interface`	Enable FEC (Forward Error Correction)

6. SD-WAN vs. DMVPN (Common Interview Qs)

Q: When would you use SD-WAN over DMVPN?

SD-WAN: When you need application-aware routing + centralized management.
DMVPN: When you need scalable IPSec tunnels but don’t need SaaS optimization.

Q: Can SD-WAN replace IPSec?

No! SD-WAN uses IPSec for encryption but adds intelligence on top.

7. Lab Practice (Quick Wins)

Simulate link failure in GNS3/EVE-NG → Watch SD-WAN switch paths.
Prioritize VoIP traffic over YouTube.
Break the orchestrator → Observe fallback to local policies.

CLI Examples (Cisco Viptela):

show sdwan control connections  # Check orchestrator status
show sdwan app-route stats      # Verify path selection
clear sdwan tunnel              # Force tunnel re-establishment

8. Interview Cheat Sheet

✅ SD-WAN = Automation + Application-Aware Routing + Multiple Underlays.
✅ IPSec is still used, but dynamically managed.
✅ Key metrics: Jitter (<30ms), Latency (<150ms), Packet Loss (<1%).
✅ Orchestrator is the brain; edges are the muscle.

The Three Planes of SD-WAN & Modern Networking

In modern networking, especially with overlay technologies like SD-WAN, we deal with three distinct planes, each serving a critical role.

1. Management Plane

Purpose: Controls device access and monitoring (SSH, SNMP, HTTPS, syslog, etc.). It's about how you interact with the device.

Key Components:

Component	Protocol	Port	Description
vManage	HTTPS (WebUI)	TCP/443	GUI/API for centralized control and configuration.
vBond	DTLS	UDP/23456	Orchestrator for device authentication and initial redirection to vManage.
Zero-Touch Provisioning (ZTP)	DHCP/HTTPS	-	Auto-configures devices out-of-the-box.

Traffic Flow:
1. Onboarding: Device contacts vBond (DTLS) → gets redirected to vManage. Downloads config/CSR via HTTPS.
2. Ongoing Management: Devices send telemetry (metrics, logs) to vManage. Policies (security, routing) are pushed from vManage.
Security Considerations:
- ✔ Always use isolated VRFs for management traffic (e.g., traditional FVRF, or VPN 512 in SD-WAN for OOB management).
- ✔ Mutual TLS (mTLS) for device-vManage communication.
- ✔ Role-Based Access Control (RBAC) in vManage.

2. Control Plane

Purpose: Handles protocols that build network intelligence (BGP, OSPF, VXLAN EVPN, SD-WAN OMP, STP, LACP, etc.). It's about how the network learns its topology and reachability.

Key Protocols (SD-WAN Specific):

Protocol	Function	Port
OMP (Overlay Management Protocol)	Advertises routes, TLOCs, policies.	DTLS/UDP/40322
BGP (optional)	Legacy WAN integration or underlay routing.	TCP/179
TLOC (Transport Locator)	Maps physical WAN links to logical tunnels for policy application.	-

How OMP Works:
1. vSmart controllers act as route reflectors for OMP.
2. Edge devices (vEdges) send:
  - Routes (prefixes learned from LAN/WAN).
  - TLOCs (tunnel endpoints, e.g., public-IP:color).
  - Policies (e.g., "prefer MPLS for VoIP").
3. vSmart redistributes this info to all edges.

Example OMP Route Advertisement:

vEdge# show omp routes
RECEIVED ROUTES:
Prefix        TLOC IP         Color          Preference
10.1.1.0/24   203.0.113.1     mpls           100
10.1.1.0/24   198.51.100.1    biz-internet   50

(MPLS is preferred over Internet due to higher preference.)

Key Traits:
- Distributes reachability info (routes, tunnels, topology).
- Runs on the CPU (software-based) and is vulnerable to floods (e.g., BGP attacks).
- Can be placed in a separate VRF (but not a traditional FVRF which is management-only).
Security Considerations:
- ✔ DTLS encryption for OMP (no cleartext control traffic!).
- ✔ Control-plane policing (CoPP) to prevent floods.
- ✔ Private WAN links (MPLS) for critical control traffic.

3. Data Plane (Forwarding Plane)

Purpose: Moves user traffic (packets/frames) at line rate (hardware-accelerated). It's about moving the actual data.

Key Technologies:

Technology	Role
IPsec/GRE	Encrypted tunnels between edges.
TLOC (Transport Locator)	Logical tunnel endpoint (e.g., `public-IP:color`).
Application-Aware Routing (AAR)	Dynamically switches paths based on SLA.

Data Flow Example:
1. Traffic arrives at vEdge:
  - Classified via DPI (Deep Packet Inspection).
  - Tagged with QoS markings (DSCP).
2. Path Selection:
  - Checks OMP-learned TLOCs and SLA metrics.
  - Chooses best path (e.g., MPLS for VoIP, Internet for web).
3. Encapsulation:
  - Wrapped in IPsec (ESP/AH) or GRE.
  - Sent to peer vEdge via WAN (MPLS/Internet/5G).

Packet Walkthrough (Simplified):

Original Packet:

SRC: 10.1.1.100 (LAN) | DST: 8.8.8.8 (Internet)

After SD-WAN Processing:

[IPsec][GRE][SD-WAN Header][Original Packet]
SRC: 203.0.113.1 (vEdge Public IP)
DST: 198.51.100.2 (Peer vEdge Public IP)

Key Traits:
- ASIC/switch-chip driven (not CPU).
- Doesn’t care about routes/tunnels—just forwards based on FIB/TCAM.
Security Considerations:
- ✔ IPsec (AES-256-GCM, IKEv2) for all tunnels.
- ✔ Zone-Based Firewall on vEdges.
- ✔ SLA-based DDoS protection (drop jitter/lossy links).

Why This Separation Matters

Plane	Runs On	Isolation Needed?	Risks if Compromised
Management	CPU	Yes (Dedicated VRF/OOB)	Total device takeover
Control	CPU	Yes (VRF/CoPP)	Network meltdown (BGP hijacks, loops)
Data	ASIC	No (but ACLs help)	Performance drops (DDoS), but no config access

Common Misconceptions

"Control Plane = Management Plane" → No!
- Control Plane: BGP, OSPF, VXLAN EVPN.
- Management Plane: SSH, SNMP.
- (They’re both CPU-based but serve different purposes.)
"A traditional FVRF can carry BGP/VXLAN" → No!
- Traditional FVRF (Front-Door VRF) is only for management traffic, isolated from data/control.
- BGP/VXLAN go in normal VRFs or a dedicated control-plane VRF.
"Data Plane Needs a VRF" → Usually No.
- Data traffic follows the FIB (built by the control plane).
- VRFs for data are typically for tenant isolation (e.g., MPLS VPNs, multi-tenancy service VPNs in SD-WAN).

Real-World Use Cases

SD-WAN
- Management: vManage (HTTPS).
- Control: OMP (Overlay Management Protocol).
- Data: Encrypted tunnels (IPsec/GRE).
VXLAN EVPN
- Management: SSH to switches.
- Control: BGP EVPN (MAC/IP routing).
- Data: VXLAN-encapsulated traffic.
Service Provider MPLS
- Management: TACACS+ for routers.
- Control: LDP/RSVP (label distribution).
- Data: Label-switched packets.

Key Takeaways

Management Plane = Your remote admin access (dedicated VRF/OOB).
Control Plane = Protocols that build the network (BGP, EVPN, OSPF, OMP).
Data Plane = Raw packet forwarding (ASIC-driven, no intelligence).

Final Thought

The industry’s failure to physically separate all three planes (like servers do with iLO) is a security flaw. But until vendors fix it:

Isolate management traffic in dedicated VRFs (like a traditional FVRF or SD-WAN's VPN 512 for OOB).
Use VRFs/CoPP for control-plane isolation and protection.
Trust ASICs for the data plane.

Deep Dive: TLOCs (Transport Locators) – The Spine of SD-WAN

TLOCs are the make-or-break abstraction in SD-WAN architectures (especially Cisco Viptela). They’re the glue between the underlay (physical links) and overlay (logical policies). But most engineers only think they understand them. Let’s fix that.

1. TLOCs: The Core Concept

A TLOC is a logical representation of a WAN edge router’s transport connection. It’s defined by three key attributes:

TLOC IP (the physical interface IP).
Color (e.g., mpls, biz-internet, lte).
Encapsulation (IPsec or TLS).

Why this matters:

TLOCs decouple policies from hardware. You can swap circuits (e.g., change ISP) without rewriting all your rules.
They enable transport-independent routing—policies reference colors, not IPs.

2. TLOC Components – What’s Under the Hood

A. TLOC Extended Attributes

These are hidden knobs that influence path selection:

Preference (like admin distance – higher = better).
Weight (for load-balancing across equal paths).
Public/Private IP (for NAT traversal).
Site-ID (prevents misrouting in multi-tenant setups).

Example:

tloc-extension {
  ip    = 203.0.113.1
  color = biz-internet
  encap = ipsec
  preference = 100  # Higher = more preferred
}

B. TLOC Groups

Primary/Backup Groups: Force deterministic failover (e.g., "Use LTE only if MPLS is down").
Geographic Groups: Steer traffic regionally (e.g., "EU branches prefer EU-based TLOCs").

Pro Tip: Misconfigured groups cause asymmetric routing—always validate with show sdwan tloc.

3. TLOC Lifecycle – How They’re Born, Live, and Die

A. TLOC Formation

Discovery: Router advertises its TLOCs via OMP (Overlay Management Protocol).
Validation: BFD (Bidirectional Forwarding Detection) confirms reachability.
Installation: TLOC enters the RIB (Routing Information Base) if valid.

Critical Check:

show sdwan omp tlocs # Verify TLOC advertisements
show sdwan bfd sessions # Confirm liveliness

B. TLOC States

Up/Active: BFD is healthy, traffic can flow.
Down/Dead: BFD failed, TLOC is pulled from RIB.
Partial: One direction works (asymmetric routing risk!).

Debugging:

show sdwan tloc | include Partial # Hunt for flapping TLOCs

4. TLOC Policies – The Real Power

A. Influencing Path Selection

Route Policy: Modify TLOC preferences per-application.

apply-policy {
  app-route voip {
    tloc = mpls preference 200  # Always prefer MPLS for VoIP
  }}

Smart TLOC Preemption: Fail back aggressively (or not).

B. TLOC Affinity

Sticky TLOCs: Pin flows to a TLOC (e.g., for SIP trunks).
Load-Balancing: Distribute across TLOCs with equal weight.

Gotcha: Affinity conflicts with Performance Routing (PfR)—tune carefully!

5. TLOC Troubleshooting – The Dark Arts

A. Common TLOC Failures

BFD Flapping → TLOCs bounce.
- Fix: Adjust BFD timers (bfd-timer 300 900 3). (Hello interval 300ms, Multiplier 3)
Color Mismatch → TLOCs don’t form.
- Fix: Ensure colors match exactly (case-sensitive!).
NAT Issues → Private IP leaks.
- Fix: Use tloc-extension public-ip.

B. Advanced Debugging

debug sdwan omp tlocs # Watch TLOC advertisements in real-time
debug sdwan bfd events # Catch BFD failures
show sdwan tloc-history # Track TLOC changes over time

6. TLOC vs. The World

Concept	TLOC	Traditional WAN Addressing
Addressing	Logical (color-based)	Physical (IP-based)
Failover	Sub-second (BFD + OMP)	Slow (BGP convergence)
Policies	Transport-agnostic	Hardcoded to interfaces

Key Takeaway: TLOCs turn network plumbing into policy-driven intent.

Final Word Mastering TLOCs means:

✅ You never blame "the SD-WAN" for routing issues—you dissect TLOC states.
✅ You design for intent (colors, groups) instead of hacking interface configs.
✅ You troubleshoot like a surgeon—OMP → BFD → TLOC → Policy.

Now go forth and make TLOCs obey. 🚀 (And when Cisco TAC says "it’s a TLOC issue," you’ll know exactly where to look.)

SD-WAN Site ID + Color + Management Subnet Integration Guide

To build a scalable, intuitive, and operationally efficient SD-WAN fabric, we’ll combine:

Site IDs (Logical location identifiers)
Colors (Underlay transport identification)
Management Subnet (VRF for OOB/In-band management)

Here’s how to plan and implement them cohesively:

1. Hierarchy & Assignment Strategy

A. Site ID + Color + Management Subnet Relationship

Component	Purpose	Example Value	Design Tip
Site ID	Uniquely identifies a branch/DC	`100` (HQ), `200` (Branch)	Use geographic encoding (e.g., `1` = Americas).
Color	Identifies WAN transport types	`mpls`, `internet`, `lte`	Match colors to ISP/underlay (e.g., `verizon_mpls`).
Mgmt Subnet	Dedicated subnet for OOB/In-band mgmt	`10.255.100.0/24` (VPN 0 or VPN 512)	Isolate from data VPNs (1-511).

B. Structured Numbering Example

Scenario: A multinational with:

Region 1 (Americas): MPLS + Internet
Region 2 (EMEA): MPLS + LTE

Site	Site ID	System IP	Colors (Transport)	Management Subnet
HQ (Dallas)	`100`	`172.16.100.1`	`mpls_blue`, `biz_internet`	`10.255.100.0/24` (VPN 0)
Branch (NY)	`101`	`172.16.101.1`	`mpls_blue`, `biz_internet`	`10.255.101.0/24` (VPN 0)
DC (Frankfurt)	`200`	`172.16.200.1`	`europe_mpls`, `lte_backup`	`10.255.200.0/24` (VPN 0)

2. Color Planning Best Practices

A. Standardize Color Naming

Use descriptive, consistent names:

<carrier>_<type> (e.g., `att_mpls`, `comcast_biz_internet`)

Avoid generic names like primary, secondary (confusing at scale).

B. Color Redundancy Rules

Assign at least 2 colors per site (e.g., mpls + internet).
Use BFD for fast failover between colors.

C. Color Mapping to TLOCs

Each color corresponds to a TLOC (Transport Locator).

Example TLOC config:

vEdge(config)# vpn 0 interface ge0/0
  tunnel-interface
    color mpls restrict  # Restrict to MPLS underlay

3. Management Subnet Strategy

A. Key Requirements

Isolation: Management traffic should be isolated.
- In-band Management: Typically resides in VPN 0 (shares the transport VRF with control/data overlay traffic but is logically separate).
- Out-of-Band (OOB) Management: For dedicated management ports (e.g., GigabitEthernet0/0 on a vEdge), use VPN 512. Routes in VPN 512 are NOT advertised into the OMP overlay.
Subnet Size: /24 recommended (supports up to 254 devices).

B. Addressing Scheme Example

For In-band Management (VPN 0):

10.255.<Site ID>.0/24
Example:
- Site ID 100 → `10.255.100.0/24`
- Site ID 200 → `10.255.200.0/24`

For Out-of-Band Management (VPN 512): Use a completely separate, non-overlapping management subnet, typically on a dedicated physical interface.

Benefits:

Predictable IPs (easy troubleshooting).
No overlaps with service VPNs.

C. vManage Integration

Define management subnets in vManage Templates:

device vpn 0
  interface eth0
    ip address 10.255.100.1/24
    tunnel-interface
      color biz_internet restrict

(For VPN 512, you'd configure a separate interface under device vpn 512).

4. Putting It All Together: Design Checklist

Site IDs: Geographic/role-based, unique, documented in IPAM.
Colors: Named after carriers, assigned to TLOCs, redundant.
Management Subnet:
- /24 in VPN 0 for in-band.
- /24 in VPN 512 for OOB (preferred for dedicated management ports).
System IPs: Align with Site ID (e.g., Site ID 100 → 172.16.100.1).

5. Common Pitfalls

❌ Color Conflicts: Reusing mpls for different ISPs (use att_mpls, verizon_mpls). ❌ Mgmt Overlaps: Sharing 10.255.100.0/24 across sites (always subnet per site). ❌ Unstructured Site IDs: Random numbers (hard to scale beyond 50 sites). ❌ Incorrect VPN for Internet Breakout: Using VPN 512 for DIA (it's for OOB management). DIA should be in a service VPN or VPN 0.

Final Topology Example

Site ID: 100 (Dallas HQ)
- System IP: 172.16.100.1
- Colors: mpls_blue, biz_internet
- Mgmt Subnet: 10.255.100.0/24 (VPN 0 for in-band)
- Service VPNs: 10 (LAN), 20 (VoIP)

SD-WAN Fabric Bring-Up Essentials

To bring up an SD-WAN fabric, you need to configure key components correctly. Below is a concise, step-by-step breakdown of the essentials, along with critical design considerations.

1. Underlay Network (VPN 0 - Transport VRF / Front-Door VRF)

Purpose: Handles control-plane traffic (OMP, DTLS/TLS tunnels between devices) and encapsulated data-plane traffic. All physical WAN interfaces that connect to the underlay belong to VPN 0.
Key Configurations:
- Interfaces: Assign WAN interfaces (e.g., MPLS, Internet, LTE) to VPN 0.
- Routing:
  - Static routes (for simple setups).
  - BGP/OSPF (for dynamic underlay routing in larger deployments).
- TLOC Extensions: Define public/private IPs for tunnel endpoints, along with colors.
Design Considerations:
- Dual Underlay: Use at least two transport types (e.g., MPLS + Internet) for redundancy.
- TLOC Preference: Prioritize cheaper/faster links (e.g., MPLS over LTE).

2. Overlay Network (OMP Routing)

Purpose: Distributes routes and policies across the fabric.
Key Configurations:
- OMP (Overlay Management Protocol): Advertises routes, TLOCs, and policies between vSmart controllers and edges.
- Route Policies: Control which prefixes are shared (e.g., only corporate LAN routes).
Design Considerations:
- Route Aggregation: Minimize prefixes advertised to vSmart (e.g., summarize branch LANs).
- TLOC Redundancy: Assign multiple TLOCs per route for failover.

3. Service VPNs (VPN 1-511)

Purpose: Segments user/data traffic (e.g., corporate LAN, guest Wi-Fi, VoIP).
Key Configurations:
- VRF Creation: Define VPNs (e.g., vpn 10 for corporate LAN).
- Interface Assignment: Assign LAN interfaces to the correct VPN.
- Route Leaking: If needed, allow controlled traffic flow between VPNs (via centralized policies).
Design Considerations:
- QoS Tagging: Apply DSCP markings per VPN (e.g., EF for VoIP in vpn 20).
- Security Policies: Restrict inter-VPN communication (e.g., guest Wi-Fi in vpn 30 can’t reach vpn 10).

4. Internet Breakout

Purpose: Local internet access (DIA) from branches or centralized internet access from a datacenter.
Key Configurations:
- NAT & Firewall: Enable NAT overload (PAT) for private→public IP translation on the egress interface.
- Policy-Based Routing (PBR) or Application-Aware Routing: Steer specific traffic (e.g., SaaS apps, guest Wi-Fi) to the local internet path.
Design Considerations:
- Security: Apply ZTNA/Umbrella or other security services for secure internet access.
- Backup Path: If local DIA fails, fall back to centralized internet via the overlay.
- Note: This is typically configured in a service VPN (e.g., VPN 10, or a dedicated internet VPN like VPN 999), or by routing traffic directly out a VPN 0 interface with specific policies and NAT. VPN 512 is reserved for Out-of-Band Management, not Internet Breakout.

5. Management & Control Plane Connectivity

Purpose: Ensures vEdges can securely connect to controllers (vManage, vSmart, vBond).
Key Configurations:
- Controller IPs: Ensure vEdges can reach vManage/vSmart/vBond over VPN 0.
- Certificate Auth: Use device certificates for secure onboarding.
Design Considerations:
- Out-of-Band (OOB) Management (VPN 512): Use a separate OOB network with interfaces in VPN 512 for high availability and isolation of management traffic from the overlay.
- Geo-Redundancy: Deploy controllers in multiple regions.

6. Security Policies

Purpose: Enforce traffic rules (e.g., blocking, inspection).
Key Configurations:
- Zone-Based Firewall: Assign interfaces to zones (e.g., "inside," "outside").
- Application-Aware Policies: Block high-risk apps (e.g., Tor, Netflix).
Design Considerations:
- Default-Deny: Start with "deny all," then allow only needed traffic.
- IPS/IDS: Enable for internet-bound traffic.

7. High Availability (HA)

Design Considerations:
- Dual vSmarts: Avoid single points of failure for the control plane.
- Active/Standby Edges: Use VRRP/HSRP for LAN-side HA at critical sites.
- Cloud Gateway Redundancy: For cloud-onramp (e.g., AWS/Azure).

Summary Checklist

Step	Action	Critical Design Tip
1. Underlay	Configure VPN 0 interfaces & routing	Dual transports (MPLS + Internet)
2. Overlay	Set up OMP & route policies	Summarize routes to reduce overhead
3. Service VPNs	Define VPNs 1-511 & assign interfaces	Use QoS for VoIP/VC traffic
4. Internet	Configure DIA in a Service VPN or VPN 0	Add ZTNA/umbrella for security
5. Management	Ensure controllers are reachable via VPN 0	OOB management (VPN 512) for resiliency
6. Security	Apply firewall/IPS policies	Default-deny approach
7. HA	Deploy redundant controllers/edges	Active/standby for critical sites

SD-WAN Application-Aware Routing (AAR) with `match app-list`

Control traffic flows based on applications using vManage policies.

1. What is `match app-list`?

Purpose: Identifies specific applications (e.g., Zoom, Netflix, VoIP) to steer traffic via policies.
Use Cases:
- Prioritize VoIP over MPLS.
- Block high-risk apps (e.g., Tor).
- Local internet breakout (DIA) for SaaS apps.

2. How It Works

Application Detection:
- Uses Deep Packet Inspection (DPI) to identify apps (even if ports are encrypted).
- Predefined app lists in vManage (e.g., VOICE-AND-VIDEO, BUSINESS-APPS).
Policy Matching:
- Policies reference app-list to trigger actions (e.g., change path, apply QoS).

3. Configuration Steps

3.1 Define an App List in vManage

Navigate to: Configuration > Policies > Custom Options > App-Aware Routing

Create a new app list:

Name: CORPORATE-APPS
Applications:
  - Microsoft-365
  - Webex-Teams
  - Zoom-Cloud

3.2 Create a Policy Using `match app-list`

Example: "Route Microsoft-365 traffic via VPN 10 (local internet breakout)" (Note: VPN 512 is for Out-of-Band Management, not Internet Breakout. Use a service VPN like VPN 10 or route out VPN 0 for DIA.)

policy-rule MICROSOFT-365-DIA
  match app-list CORPORATE-APPS  # Match predefined apps
  action accept
  set vpn 10                      # Force local internet breakout via VPN 10
  set dscp 46                     # Mark for QoS (EF)

3.3 Apply Policy to Sites

Attach policy to a Centralized Policy in vManage.
Push to target sites.

4. Best Practices

4.1 App List Design

Group logically:
- VOICE-AND-VIDEO: Zoom, Webex, MS-Teams.
- BUSINESS-CRITICAL: SAP, Oracle, Salesforce.
Avoid overly broad lists (e.g., "ALL-WEB") to prevent unintended matches.

4.2 Policy Ordering

Higher priority (lower number) policies evaluate first.

policy-list AAR-POLICY
  sequence 10
    match app-list VOICE-AND-VIDEO
    action accept
    set color mpls        # Force MPLS for voice
  sequence 20
    match app-list NETFLIX
    action drop           # Block Netflix

4.3 SLA-Based Fallback

Combine with Performance Routing (PfR) to switch paths if SLA fails:

match app-list WEBEX
action accept
set sla preferred-color mpls latency 100ms

5. Verification & Troubleshooting

5.1 Key Commands

Command	Purpose
`show sdwan app-aware stats`	Lists detected apps and paths.
`show sdwan policy service-statistics`	Checks policy hits.
`show sdwan app-fwd dpi flows`	Inspects DPI-classified flows.

5.2 Common Issues

Symptom	Likely Cause	Fix
App traffic not matching	Incorrect app-list definition	Verify app names in vManage.
Policy not applying	Wrong policy priority	Reorder policies (lower sequence = higher priority).
DPI not detecting apps	Encryption (TLS 1.3)	Use IP-based matching as fallback.

6. Advanced Use Cases

6.1 Custom DPI Signatures

For proprietary apps, add custom signatures:

app-list CUSTOM-APP
  signature TCP port 5000 protocol HTTP user-agent "MyApp*"

6.2 Combining with QoS

Mark apps for prioritization:

match app-list VOICE
action accept
set dscp ef           # Expedited Forwarding (VoIP)

6.3 Internet Breakout for Specific Apps

match app-list SALESFORCE
action accept
set vpn 10                    # Local breakout via VPN 10
set nat use-vpn 0             # Use VPN 0's NAT pool (if VPN 0 is internet-facing)

7. Summary Checklist

Define app lists in vManage (Configuration > Policies > App-Aware Routing).
Use match app-list in policies to steer traffic.
Test with show sdwan app-aware stats.
Combine with SLA for dynamic failover.

Key Takeaways

match app-list enables application-aware routing (not just IP/port-based).
DPI visibility can be affected by strong encryption (e.g., TLS 1.3 with ESNI) → May need fallback to IP-based matching.
Policy order matters — Highest priority (lowest sequence) evaluates first.

Front-Door VRF (FVRF) Explained (Using Cisco Gear)

Front-Door VRF (FVRF) is a Cisco feature that enhances security by separating the management plane from the data plane in network devices (routers, switches, firewalls). It achieves this by placing the management interface (SSH, SNMP, HTTPS, etc.) in a separate Virtual Routing and Forwarding (VRF) instance, isolating it from the default global routing table.

Note: While this document describes the general concept of Front-Door VRF in Cisco devices, in Cisco SD-WAN (Viptela-based) architectures:

VPN 0 is often referred to as the "Front-Door VRF" in the sense that it is the transport VRF carrying all overlay control and data tunnel traffic, and often in-band management.
VPN 512 is used for isolated out-of-band management, conceptually similar to a traditional FVRF.

Why Use Front-Door VRF?

Security: Prevents unauthorized access to management interfaces via data-plane attacks.
Isolation: Ensures management traffic doesn’t mix with production traffic.
Multi-Tenancy: Useful in service provider environments where management traffic must be segregated per customer.
Simplified Routing: Avoids route conflicts between management and data networks.

How FVRF Works

The management interface (e.g., Mgmt0/0) is assigned to a dedicated VRF (e.g., MGMT-VRF).
All management traffic (SSH, SNMP, etc.) must go through this VRF.
The data plane (regular traffic) uses the default global routing table or other service VRFs.

Configuration Example (Cisco IOS-XE / IOS)

1. Create the Management VRF

configure terminal
vrf definition MGMT-VRF
 rd 100:1  ! Route Distinguisher (for uniqueness)
 address-family ipv4
 exit-address-family
exit

2. Assign the Management Interface to the VRF

interface GigabitEthernet0/0
 description Management Interface
 vrf forwarding MGMT-VRF
 ip address 192.168.1.1 255.255.255.0
 no shutdown
exit

3. Configure a Default Route for Management Traffic

ip route vrf MGMT-VRF 0.0.0.0 0.0.0.0 192.168.1.254

(Where 192.168.1.254 is the gateway for management traffic.)

4. Enable VRF-Aware Services

ip http server
ip http vrf MGMT-VRF  ! Ensures HTTP/HTTPS uses the MGMT-VRF
line vty 0 4
 transport input ssh vrf-alias MGMT-VRF enable  ! Ensures SSH uses the MGMT-VRF
exit

Verification

Check VRF routing table:
```
show ip route vrf MGMT-VRF
```
Verify interface assignment:
```
show vrf brief
```
Test connectivity:
```
ping vrf MGMT-VRF 192.168.1.254
```

Key Considerations

NTP & DNS: If management relies on NTP/DNS, ensure they are reachable via the FVRF.
Backup Access: Always maintain an alternative access method (console) in case of misconfiguration.
Compatibility: Some older Cisco devices may not support all VRF-aware services.

Conclusion

Front-Door VRF is a best practice for securing management traffic in Cisco environments. By isolating management interfaces in a separate VRF, you reduce attack surfaces and prevent unauthorized access through data-plane vulnerabilities.

35 KiB Raw Blame History Unescape Escape

Mastering SD-WAN: From Fundamentals to the Top 1% Mindset

The Top 1% Mindset

SD-WAN Crash Course: The 20% That Matters

1. SD-WAN vs Traditional WAN

2. SD-WAN Core Components

3. How SD-WAN Works (The 80% You Need)

4. SD-WAN + IPSec Integration

5. SD-WAN Troubleshooting (Top 5 Issues)

6. SD-WAN vs. DMVPN (Common Interview Qs)

7. Lab Practice (Quick Wins)

8. Interview Cheat Sheet

The Three Planes of SD-WAN & Modern Networking

1. Management Plane

2. Control Plane

3. Data Plane (Forwarding Plane)

Why This Separation Matters

Common Misconceptions

Real-World Use Cases

Key Takeaways

Final Thought

Deep Dive: TLOCs (Transport Locators) – The Spine of SD-WAN

1. TLOCs: The Core Concept

2. TLOC Components – What’s Under the Hood

A. TLOC Extended Attributes

B. TLOC Groups

3. TLOC Lifecycle – How They’re Born, Live, and Die

A. TLOC Formation

B. TLOC States

4. TLOC Policies – The Real Power

A. Influencing Path Selection

B. TLOC Affinity

5. TLOC Troubleshooting – The Dark Arts

A. Common TLOC Failures

B. Advanced Debugging

6. TLOC vs. The World

SD-WAN Site ID + Color + Management Subnet Integration Guide

1. Hierarchy & Assignment Strategy

A. Site ID + Color + Management Subnet Relationship

B. Structured Numbering Example

2. Color Planning Best Practices

A. Standardize Color Naming

B. Color Redundancy Rules

C. Color Mapping to TLOCs

3. Management Subnet Strategy

A. Key Requirements

B. Addressing Scheme Example

C. vManage Integration

4. Putting It All Together: Design Checklist

5. Common Pitfalls

Final Topology Example

SD-WAN Fabric Bring-Up Essentials

1. Underlay Network (VPN 0 - Transport VRF / Front-Door VRF)

2. Overlay Network (OMP Routing)

3. Service VPNs (VPN 1-511)

4. Internet Breakout

5. Management & Control Plane Connectivity

6. Security Policies

7. High Availability (HA)

Summary Checklist

SD-WAN Application-Aware Routing (AAR) with match app-list

1. What is match app-list?

2. How It Works

3. Configuration Steps

3.1 Define an App List in vManage

3.2 Create a Policy Using match app-list

3.3 Apply Policy to Sites

4. Best Practices

4.1 App List Design

4.2 Policy Ordering

4.3 SLA-Based Fallback

5. Verification & Troubleshooting

5.1 Key Commands

5.2 Common Issues

6. Advanced Use Cases

6.1 Custom DPI Signatures

6.2 Combining with QoS

6.3 Internet Breakout for Specific Apps

7. Summary Checklist

Key Takeaways

35 KiB

Raw Blame History

SD-WAN Application-Aware Routing (AAR) with `match app-list`

1. What is `match app-list`?

3.2 Create a Policy Using `match app-list`