Files
the_information_nexus/tech_docs/networking/sdwan_primer.md

35 KiB
Raw Blame History

Your collection of documents on SD-WAN is exceptionally thorough and well-structured. I've compiled them into a single, comprehensive draft, incorporating the refinements and addressing the inconsistencies we've discussed, particularly around VPN 0, VPN 512, and the Front-Door VRF concept.

I've aimed to create a cohesive flow, starting with the "Top 1% Mindset" to set the stage, moving into the crash course for foundational understanding, then diving deep into TLOCs and the three planes, and finally, detailing key configurations and troubleshooting.

Here is the complete draft:


Mastering SD-WAN: From Fundamentals to the Top 1% Mindset

The Top 1% Mindset

You dont just deploy SD-WAN—you orchestrate it. You think in abstractions (colors, TLOCs, VPNs) not hardware. You troubleshoot like a surgeon—control plane first, then data plane, then app logic.

Example: Problem: VoIP calls drop but O365 works. Top 1% Debug:

  • Check BFD (control plane).
  • Verify TLOC preferences (is LTE taking over incorrectly?).
  • Inspect app-route policy (is VoIP pinned to MPLS but PfR overriding?).
  • Drill into show app-aware stats (is jitter spiking on broadband?).

Final Thought Most SD-WAN "engineers" just click through vManage. The real pros know:

  • Transport independence isnt automatic—its designed.
  • Policies arent rules—theyre a logic flow.
  • Troubleshooting isnt guessing—its methodical dissection.

Youre asking the right questions. Now go break (then fix) some TLOCs. 🚀 (And yes, we both know Ciscos docs dont explain this stuff clearly—thats why the top 1% reverse-engineer it.)


SD-WAN Crash Course: The 20% That Matters

Goal: Understand core SD-WAN concepts, how they differ from traditional WAN, and how they integrate with IPSec.

1. SD-WAN vs Traditional WAN

Feature Traditional WAN (MPLS/VPN) SD-WAN
Cost Expensive (MPLS circuits) Cheaper (uses Internet + broadband)
Agility Manual config changes Centralized, automated policies
Performance Predictable but rigid Dynamic path selection (jitter/loss-aware)
Security Relies on IPSec/MPLS Built-in encryption (IPSec, TLS)
Topology Hub-and-spoke Any-to-any, mesh

Key Takeaway:

  • SD-WAN decouples control plane from hardware, allowing dynamic traffic routing over any transport (MPLS, LTE, broadband).

2. SD-WAN Core Components

(1) Edge Devices (CPE)

  • e.g., Cisco vEdge, FortiGate, VeloCloud
  • Sit at branch offices, apply policies, and encrypt traffic.

(2) Orchestrator (Controller)

  • e.g., Cisco vManage, VMware Orchestrator
  • Centralized policy management (no CLI needed!).

(3) Overlay Tunnels

  • Encrypted tunnels (IPSec, GRE, DTLS) between edges.
  • Uses TLOC (Transport Locator) = Public IP + Color (e.g., INET, MPLS).

(4) Underlay Transport

  • Any WAN link: MPLS, Internet, LTE, 5G.

3. How SD-WAN Works (The 80% You Need)

(1) Path Selection

  • Dynamic multi-path steering: Chooses best path based on:
    • Application SLA (e.g., VoIP → low latency).
    • Real-time metrics (jitter, packet loss, latency).

Example Policy:

IF (Application == VoIP) AND (Latency > 50ms) → SWITCH to backup link

(2) Zero-Touch Provisioning (ZTP)

  • Plug in a device → auto-configures via orchestrator.

(3) Application-Aware Routing

  • DPI (Deep Packet Inspection) identifies apps (e.g., Teams, SAP).
    • (Note: While effective, some advanced encryption like TLS 1.3 can limit DPI's visibility, requiring IP-based fallbacks.)
  • QoS prioritization (VoIP > YouTube).

(4) Security Integration

  • IPSec for all overlays (mandatory for Internet links).
  • Cloud-based firewalls (e.g., FortiGate, Zscaler).

4. SD-WAN + IPSec Integration

  • SD-WAN uses IPSec for secure tunnels but adds:
    • Automated key rotation (no manual PSK updates).
    • Tunnel bonding (combines multiple links for throughput).

Key Difference:

  • Traditional IPSec VPN = static tunnels.
  • SD-WAN IPSec = dynamic, SLA-driven tunnels.

5. SD-WAN Troubleshooting (Top 5 Issues)

Issue Debug Command Fix
Tunnels not coming up show sdwan tunnel (Cisco) Check underlay reachability
Poor VoIP quality show sdwan app-route stats Adjust SLA thresholds
Orchestrator sync failure show sdwan control connections Verify certs/connectivity
Traffic taking wrong path show sdwan policy-service-path Fix application-aware rules
High latency on backup show sdwan interface Enable FEC (Forward Error Correction)

6. SD-WAN vs. DMVPN (Common Interview Qs)

Q: When would you use SD-WAN over DMVPN?

  • SD-WAN: When you need application-aware routing + centralized management.
  • DMVPN: When you need scalable IPSec tunnels but dont need SaaS optimization.

Q: Can SD-WAN replace IPSec?

  • No! SD-WAN uses IPSec for encryption but adds intelligence on top.

7. Lab Practice (Quick Wins)

  1. Simulate link failure in GNS3/EVE-NG → Watch SD-WAN switch paths.
  2. Prioritize VoIP traffic over YouTube.
  3. Break the orchestrator → Observe fallback to local policies.

CLI Examples (Cisco Viptela):

show sdwan control connections  # Check orchestrator status
show sdwan app-route stats      # Verify path selection
clear sdwan tunnel              # Force tunnel re-establishment

8. Interview Cheat Sheet

  • SD-WAN = Automation + Application-Aware Routing + Multiple Underlays.
  • IPSec is still used, but dynamically managed.
  • Key metrics: Jitter (<30ms), Latency (<150ms), Packet Loss (<1%).
  • Orchestrator is the brain; edges are the muscle.

The Three Planes of SD-WAN & Modern Networking

In modern networking, especially with overlay technologies like SD-WAN, we deal with three distinct planes, each serving a critical role.

1. Management Plane

  • Purpose: Controls device access and monitoring (SSH, SNMP, HTTPS, syslog, etc.). It's about how you interact with the device.
  • Key Components:
    Component Protocol Port Description
    vManage HTTPS (WebUI) TCP/443 GUI/API for centralized control and configuration.
    vBond DTLS UDP/23456 Orchestrator for device authentication and initial redirection to vManage.
    Zero-Touch Provisioning (ZTP) DHCP/HTTPS - Auto-configures devices out-of-the-box.
  • Traffic Flow:
    1. Onboarding: Device contacts vBond (DTLS) → gets redirected to vManage. Downloads config/CSR via HTTPS.
    2. Ongoing Management: Devices send telemetry (metrics, logs) to vManage. Policies (security, routing) are pushed from vManage.
  • Security Considerations:
    • Always use isolated VRFs for management traffic (e.g., traditional FVRF, or VPN 512 in SD-WAN for OOB management).
    • Mutual TLS (mTLS) for device-vManage communication.
    • Role-Based Access Control (RBAC) in vManage.

2. Control Plane

  • Purpose: Handles protocols that build network intelligence (BGP, OSPF, VXLAN EVPN, SD-WAN OMP, STP, LACP, etc.). It's about how the network learns its topology and reachability.
  • Key Protocols (SD-WAN Specific):
    Protocol Function Port
    OMP (Overlay Management Protocol) Advertises routes, TLOCs, policies. DTLS/UDP/40322
    BGP (optional) Legacy WAN integration or underlay routing. TCP/179
    TLOC (Transport Locator) Maps physical WAN links to logical tunnels for policy application. -
  • How OMP Works:
    1. vSmart controllers act as route reflectors for OMP.
    2. Edge devices (vEdges) send:
      • Routes (prefixes learned from LAN/WAN).
      • TLOCs (tunnel endpoints, e.g., public-IP:color).
      • Policies (e.g., "prefer MPLS for VoIP").
    3. vSmart redistributes this info to all edges.
  • Example OMP Route Advertisement:
    vEdge# show omp routes
    RECEIVED ROUTES:
    Prefix        TLOC IP         Color          Preference
    10.1.1.0/24   203.0.113.1     mpls           100
    10.1.1.0/24   198.51.100.1    biz-internet   50
    
    (MPLS is preferred over Internet due to higher preference.)
  • Key Traits:
    • Distributes reachability info (routes, tunnels, topology).
    • Runs on the CPU (software-based) and is vulnerable to floods (e.g., BGP attacks).
    • Can be placed in a separate VRF (but not a traditional FVRF which is management-only).
  • Security Considerations:
    • DTLS encryption for OMP (no cleartext control traffic!).
    • Control-plane policing (CoPP) to prevent floods.
    • Private WAN links (MPLS) for critical control traffic.

3. Data Plane (Forwarding Plane)

  • Purpose: Moves user traffic (packets/frames) at line rate (hardware-accelerated). It's about moving the actual data.
  • Key Technologies:
    Technology Role
    IPsec/GRE Encrypted tunnels between edges.
    TLOC (Transport Locator) Logical tunnel endpoint (e.g., public-IP:color).
    Application-Aware Routing (AAR) Dynamically switches paths based on SLA.
  • Data Flow Example:
    1. Traffic arrives at vEdge:
      • Classified via DPI (Deep Packet Inspection).
      • Tagged with QoS markings (DSCP).
    2. Path Selection:
      • Checks OMP-learned TLOCs and SLA metrics.
      • Chooses best path (e.g., MPLS for VoIP, Internet for web).
    3. Encapsulation:
      • Wrapped in IPsec (ESP/AH) or GRE.
      • Sent to peer vEdge via WAN (MPLS/Internet/5G).
  • Packet Walkthrough (Simplified):
    1. Original Packet:
      SRC: 10.1.1.100 (LAN) | DST: 8.8.8.8 (Internet)
      
    2. After SD-WAN Processing:
      [IPsec][GRE][SD-WAN Header][Original Packet]
      SRC: 203.0.113.1 (vEdge Public IP)
      DST: 198.51.100.2 (Peer vEdge Public IP)
      
  • Key Traits:
    • ASIC/switch-chip driven (not CPU).
    • Doesnt care about routes/tunnels—just forwards based on FIB/TCAM.
  • Security Considerations:
    • IPsec (AES-256-GCM, IKEv2) for all tunnels.
    • Zone-Based Firewall on vEdges.
    • SLA-based DDoS protection (drop jitter/lossy links).

Why This Separation Matters

Plane Runs On Isolation Needed? Risks if Compromised
Management CPU Yes (Dedicated VRF/OOB) Total device takeover
Control CPU Yes (VRF/CoPP) Network meltdown (BGP hijacks, loops)
Data ASIC No (but ACLs help) Performance drops (DDoS), but no config access

Common Misconceptions

  1. "Control Plane = Management Plane"No!
    • Control Plane: BGP, OSPF, VXLAN EVPN.
    • Management Plane: SSH, SNMP.
    • (Theyre both CPU-based but serve different purposes.)
  2. "A traditional FVRF can carry BGP/VXLAN"No!
    • Traditional FVRF (Front-Door VRF) is only for management traffic, isolated from data/control.
    • BGP/VXLAN go in normal VRFs or a dedicated control-plane VRF.
  3. "Data Plane Needs a VRF"Usually No.
    • Data traffic follows the FIB (built by the control plane).
    • VRFs for data are typically for tenant isolation (e.g., MPLS VPNs, multi-tenancy service VPNs in SD-WAN).

Real-World Use Cases

  1. SD-WAN
    • Management: vManage (HTTPS).
    • Control: OMP (Overlay Management Protocol).
    • Data: Encrypted tunnels (IPsec/GRE).
  2. VXLAN EVPN
    • Management: SSH to switches.
    • Control: BGP EVPN (MAC/IP routing).
    • Data: VXLAN-encapsulated traffic.
  3. Service Provider MPLS
    • Management: TACACS+ for routers.
    • Control: LDP/RSVP (label distribution).
    • Data: Label-switched packets.

Key Takeaways

  1. Management Plane = Your remote admin access (dedicated VRF/OOB).
  2. Control Plane = Protocols that build the network (BGP, EVPN, OSPF, OMP).
  3. Data Plane = Raw packet forwarding (ASIC-driven, no intelligence).

Final Thought

The industrys failure to physically separate all three planes (like servers do with iLO) is a security flaw. But until vendors fix it:

  • Isolate management traffic in dedicated VRFs (like a traditional FVRF or SD-WAN's VPN 512 for OOB).
  • Use VRFs/CoPP for control-plane isolation and protection.
  • Trust ASICs for the data plane.

Deep Dive: TLOCs (Transport Locators) The Spine of SD-WAN

TLOCs are the make-or-break abstraction in SD-WAN architectures (especially Cisco Viptela). Theyre the glue between the underlay (physical links) and overlay (logical policies). But most engineers only think they understand them. Lets fix that.

1. TLOCs: The Core Concept

A TLOC is a logical representation of a WAN edge routers transport connection. Its defined by three key attributes:

  • TLOC IP (the physical interface IP).
  • Color (e.g., mpls, biz-internet, lte).
  • Encapsulation (IPsec or TLS).

Why this matters:

  • TLOCs decouple policies from hardware. You can swap circuits (e.g., change ISP) without rewriting all your rules.
  • They enable transport-independent routing—policies reference colors, not IPs.

2. TLOC Components Whats Under the Hood

A. TLOC Extended Attributes

These are hidden knobs that influence path selection:

  • Preference (like admin distance higher = better).
  • Weight (for load-balancing across equal paths).
  • Public/Private IP (for NAT traversal).
  • Site-ID (prevents misrouting in multi-tenant setups).

Example:

tloc-extension {
  ip    = 203.0.113.1
  color = biz-internet
  encap = ipsec
  preference = 100  # Higher = more preferred
}

B. TLOC Groups

  • Primary/Backup Groups: Force deterministic failover (e.g., "Use LTE only if MPLS is down").
  • Geographic Groups: Steer traffic regionally (e.g., "EU branches prefer EU-based TLOCs").

Pro Tip: Misconfigured groups cause asymmetric routing—always validate with show sdwan tloc.

3. TLOC Lifecycle How Theyre Born, Live, and Die

A. TLOC Formation

  • Discovery: Router advertises its TLOCs via OMP (Overlay Management Protocol).
  • Validation: BFD (Bidirectional Forwarding Detection) confirms reachability.
  • Installation: TLOC enters the RIB (Routing Information Base) if valid.

Critical Check:

  • show sdwan omp tlocs # Verify TLOC advertisements
  • show sdwan bfd sessions # Confirm liveliness

B. TLOC States

  • Up/Active: BFD is healthy, traffic can flow.
  • Down/Dead: BFD failed, TLOC is pulled from RIB.
  • Partial: One direction works (asymmetric routing risk!).

Debugging:

  • show sdwan tloc | include Partial # Hunt for flapping TLOCs

4. TLOC Policies The Real Power

A. Influencing Path Selection

  • Route Policy: Modify TLOC preferences per-application.
    apply-policy {
      app-route voip {
        tloc = mpls preference 200  # Always prefer MPLS for VoIP
      }}
    
  • Smart TLOC Preemption: Fail back aggressively (or not).

B. TLOC Affinity

  • Sticky TLOCs: Pin flows to a TLOC (e.g., for SIP trunks).
  • Load-Balancing: Distribute across TLOCs with equal weight.

Gotcha: Affinity conflicts with Performance Routing (PfR)—tune carefully!

5. TLOC Troubleshooting The Dark Arts

A. Common TLOC Failures

  • BFD Flapping → TLOCs bounce.
    • Fix: Adjust BFD timers (bfd-timer 300 900 3). (Hello interval 300ms, Multiplier 3)
  • Color Mismatch → TLOCs dont form.
    • Fix: Ensure colors match exactly (case-sensitive!).
  • NAT Issues → Private IP leaks.
    • Fix: Use tloc-extension public-ip.

B. Advanced Debugging

  • debug sdwan omp tlocs # Watch TLOC advertisements in real-time
  • debug sdwan bfd events # Catch BFD failures
  • show sdwan tloc-history # Track TLOC changes over time

6. TLOC vs. The World

Concept TLOC Traditional WAN Addressing
Addressing Logical (color-based) Physical (IP-based)
Failover Sub-second (BFD + OMP) Slow (BGP convergence)
Policies Transport-agnostic Hardcoded to interfaces

Key Takeaway: TLOCs turn network plumbing into policy-driven intent.

Final Word Mastering TLOCs means:

  • You never blame "the SD-WAN" for routing issues—you dissect TLOC states.
  • You design for intent (colors, groups) instead of hacking interface configs.
  • You troubleshoot like a surgeon—OMP → BFD → TLOC → Policy.

Now go forth and make TLOCs obey. 🚀 (And when Cisco TAC says "its a TLOC issue," youll know exactly where to look.)


SD-WAN Site ID + Color + Management Subnet Integration Guide

To build a scalable, intuitive, and operationally efficient SD-WAN fabric, well combine:

  1. Site IDs (Logical location identifiers)
  2. Colors (Underlay transport identification)
  3. Management Subnet (VRF for OOB/In-band management)

Heres how to plan and implement them cohesively:

1. Hierarchy & Assignment Strategy

A. Site ID + Color + Management Subnet Relationship

Component Purpose Example Value Design Tip
Site ID Uniquely identifies a branch/DC 100 (HQ), 200 (Branch) Use geographic encoding (e.g., 1 = Americas).
Color Identifies WAN transport types mpls, internet, lte Match colors to ISP/underlay (e.g., verizon_mpls).
Mgmt Subnet Dedicated subnet for OOB/In-band mgmt 10.255.100.0/24 (VPN 0 or VPN 512) Isolate from data VPNs (1-511).

B. Structured Numbering Example

Scenario: A multinational with:

  • Region 1 (Americas): MPLS + Internet
  • Region 2 (EMEA): MPLS + LTE
Site Site ID System IP Colors (Transport) Management Subnet
HQ (Dallas) 100 172.16.100.1 mpls_blue, biz_internet 10.255.100.0/24 (VPN 0)
Branch (NY) 101 172.16.101.1 mpls_blue, biz_internet 10.255.101.0/24 (VPN 0)
DC (Frankfurt) 200 172.16.200.1 europe_mpls, lte_backup 10.255.200.0/24 (VPN 0)

2. Color Planning Best Practices

A. Standardize Color Naming

  • Use descriptive, consistent names:
    <carrier>_<type> (e.g., `att_mpls`, `comcast_biz_internet`)
    
  • Avoid generic names like primary, secondary (confusing at scale).

B. Color Redundancy Rules

  • Assign at least 2 colors per site (e.g., mpls + internet).
  • Use BFD for fast failover between colors.

C. Color Mapping to TLOCs

  • Each color corresponds to a TLOC (Transport Locator).
  • Example TLOC config:
    vEdge(config)# vpn 0 interface ge0/0
      tunnel-interface
        color mpls restrict  # Restrict to MPLS underlay
    

3. Management Subnet Strategy

A. Key Requirements

  • Isolation: Management traffic should be isolated.
    • In-band Management: Typically resides in VPN 0 (shares the transport VRF with control/data overlay traffic but is logically separate).
    • Out-of-Band (OOB) Management: For dedicated management ports (e.g., GigabitEthernet0/0 on a vEdge), use VPN 512. Routes in VPN 512 are NOT advertised into the OMP overlay.
  • Subnet Size: /24 recommended (supports up to 254 devices).

B. Addressing Scheme Example

For In-band Management (VPN 0):

10.255.<Site ID>.0/24
Example:
- Site ID 100 → `10.255.100.0/24`
- Site ID 200 → `10.255.200.0/24`

For Out-of-Band Management (VPN 512): Use a completely separate, non-overlapping management subnet, typically on a dedicated physical interface.

Benefits:

  • Predictable IPs (easy troubleshooting).
  • No overlaps with service VPNs.

C. vManage Integration

  • Define management subnets in vManage Templates:
    device vpn 0
      interface eth0
        ip address 10.255.100.1/24
        tunnel-interface
          color biz_internet restrict
    
    (For VPN 512, you'd configure a separate interface under device vpn 512).

4. Putting It All Together: Design Checklist

  1. Site IDs: Geographic/role-based, unique, documented in IPAM.
  2. Colors: Named after carriers, assigned to TLOCs, redundant.
  3. Management Subnet:
    • /24 in VPN 0 for in-band.
    • /24 in VPN 512 for OOB (preferred for dedicated management ports).
  4. System IPs: Align with Site ID (e.g., Site ID 100172.16.100.1).

5. Common Pitfalls

Color Conflicts: Reusing mpls for different ISPs (use att_mpls, verizon_mpls). Mgmt Overlaps: Sharing 10.255.100.0/24 across sites (always subnet per site). Unstructured Site IDs: Random numbers (hard to scale beyond 50 sites). Incorrect VPN for Internet Breakout: Using VPN 512 for DIA (it's for OOB management). DIA should be in a service VPN or VPN 0.

Final Topology Example

Site ID: 100 (Dallas HQ)
- System IP: 172.16.100.1
- Colors: mpls_blue, biz_internet
- Mgmt Subnet: 10.255.100.0/24 (VPN 0 for in-band)
- Service VPNs: 10 (LAN), 20 (VoIP)

SD-WAN Fabric Bring-Up Essentials

To bring up an SD-WAN fabric, you need to configure key components correctly. Below is a concise, step-by-step breakdown of the essentials, along with critical design considerations.

1. Underlay Network (VPN 0 - Transport VRF / Front-Door VRF)

  • Purpose: Handles control-plane traffic (OMP, DTLS/TLS tunnels between devices) and encapsulated data-plane traffic. All physical WAN interfaces that connect to the underlay belong to VPN 0.
  • Key Configurations:
    • Interfaces: Assign WAN interfaces (e.g., MPLS, Internet, LTE) to VPN 0.
    • Routing:
      • Static routes (for simple setups).
      • BGP/OSPF (for dynamic underlay routing in larger deployments).
    • TLOC Extensions: Define public/private IPs for tunnel endpoints, along with colors.
  • Design Considerations:
    • Dual Underlay: Use at least two transport types (e.g., MPLS + Internet) for redundancy.
    • TLOC Preference: Prioritize cheaper/faster links (e.g., MPLS over LTE).

2. Overlay Network (OMP Routing)

  • Purpose: Distributes routes and policies across the fabric.
  • Key Configurations:
    • OMP (Overlay Management Protocol): Advertises routes, TLOCs, and policies between vSmart controllers and edges.
    • Route Policies: Control which prefixes are shared (e.g., only corporate LAN routes).
  • Design Considerations:
    • Route Aggregation: Minimize prefixes advertised to vSmart (e.g., summarize branch LANs).
    • TLOC Redundancy: Assign multiple TLOCs per route for failover.

3. Service VPNs (VPN 1-511)

  • Purpose: Segments user/data traffic (e.g., corporate LAN, guest Wi-Fi, VoIP).
  • Key Configurations:
    • VRF Creation: Define VPNs (e.g., vpn 10 for corporate LAN).
    • Interface Assignment: Assign LAN interfaces to the correct VPN.
    • Route Leaking: If needed, allow controlled traffic flow between VPNs (via centralized policies).
  • Design Considerations:
    • QoS Tagging: Apply DSCP markings per VPN (e.g., EF for VoIP in vpn 20).
    • Security Policies: Restrict inter-VPN communication (e.g., guest Wi-Fi in vpn 30 cant reach vpn 10).

4. Internet Breakout

  • Purpose: Local internet access (DIA) from branches or centralized internet access from a datacenter.
  • Key Configurations:
    • NAT & Firewall: Enable NAT overload (PAT) for private→public IP translation on the egress interface.
    • Policy-Based Routing (PBR) or Application-Aware Routing: Steer specific traffic (e.g., SaaS apps, guest Wi-Fi) to the local internet path.
  • Design Considerations:
    • Security: Apply ZTNA/Umbrella or other security services for secure internet access.
    • Backup Path: If local DIA fails, fall back to centralized internet via the overlay.
    • Note: This is typically configured in a service VPN (e.g., VPN 10, or a dedicated internet VPN like VPN 999), or by routing traffic directly out a VPN 0 interface with specific policies and NAT. VPN 512 is reserved for Out-of-Band Management, not Internet Breakout.

5. Management & Control Plane Connectivity

  • Purpose: Ensures vEdges can securely connect to controllers (vManage, vSmart, vBond).
  • Key Configurations:
    • Controller IPs: Ensure vEdges can reach vManage/vSmart/vBond over VPN 0.
    • Certificate Auth: Use device certificates for secure onboarding.
  • Design Considerations:
    • Out-of-Band (OOB) Management (VPN 512): Use a separate OOB network with interfaces in VPN 512 for high availability and isolation of management traffic from the overlay.
    • Geo-Redundancy: Deploy controllers in multiple regions.

6. Security Policies

  • Purpose: Enforce traffic rules (e.g., blocking, inspection).
  • Key Configurations:
    • Zone-Based Firewall: Assign interfaces to zones (e.g., "inside," "outside").
    • Application-Aware Policies: Block high-risk apps (e.g., Tor, Netflix).
  • Design Considerations:
    • Default-Deny: Start with "deny all," then allow only needed traffic.
    • IPS/IDS: Enable for internet-bound traffic.

7. High Availability (HA)

  • Design Considerations:
    • Dual vSmarts: Avoid single points of failure for the control plane.
    • Active/Standby Edges: Use VRRP/HSRP for LAN-side HA at critical sites.
    • Cloud Gateway Redundancy: For cloud-onramp (e.g., AWS/Azure).

Summary Checklist

Step Action Critical Design Tip
1. Underlay Configure VPN 0 interfaces & routing Dual transports (MPLS + Internet)
2. Overlay Set up OMP & route policies Summarize routes to reduce overhead
3. Service VPNs Define VPNs 1-511 & assign interfaces Use QoS for VoIP/VC traffic
4. Internet Configure DIA in a Service VPN or VPN 0 Add ZTNA/umbrella for security
5. Management Ensure controllers are reachable via VPN 0 OOB management (VPN 512) for resiliency
6. Security Apply firewall/IPS policies Default-deny approach
7. HA Deploy redundant controllers/edges Active/standby for critical sites

SD-WAN Application-Aware Routing (AAR) with match app-list

Control traffic flows based on applications using vManage policies.

1. What is match app-list?

  • Purpose: Identifies specific applications (e.g., Zoom, Netflix, VoIP) to steer traffic via policies.
  • Use Cases:
    • Prioritize VoIP over MPLS.
    • Block high-risk apps (e.g., Tor).
    • Local internet breakout (DIA) for SaaS apps.

2. How It Works

  1. Application Detection:
    • Uses Deep Packet Inspection (DPI) to identify apps (even if ports are encrypted).
    • Predefined app lists in vManage (e.g., VOICE-AND-VIDEO, BUSINESS-APPS).
  2. Policy Matching:
    • Policies reference app-list to trigger actions (e.g., change path, apply QoS).

3. Configuration Steps

3.1 Define an App List in vManage

  1. Navigate to: Configuration > Policies > Custom Options > App-Aware Routing
  2. Create a new app list:
    Name: CORPORATE-APPS
    Applications:
      - Microsoft-365
      - Webex-Teams
      - Zoom-Cloud
    

3.2 Create a Policy Using match app-list

Example: "Route Microsoft-365 traffic via VPN 10 (local internet breakout)" (Note: VPN 512 is for Out-of-Band Management, not Internet Breakout. Use a service VPN like VPN 10 or route out VPN 0 for DIA.)

policy-rule MICROSOFT-365-DIA
  match app-list CORPORATE-APPS  # Match predefined apps
  action accept
  set vpn 10                      # Force local internet breakout via VPN 10
  set dscp 46                     # Mark for QoS (EF)

3.3 Apply Policy to Sites

  1. Attach policy to a Centralized Policy in vManage.
  2. Push to target sites.

4. Best Practices

4.1 App List Design

  • Group logically:
    • VOICE-AND-VIDEO: Zoom, Webex, MS-Teams.
    • BUSINESS-CRITICAL: SAP, Oracle, Salesforce.
  • Avoid overly broad lists (e.g., "ALL-WEB") to prevent unintended matches.

4.2 Policy Ordering

  • Higher priority (lower number) policies evaluate first.
    policy-list AAR-POLICY
      sequence 10
        match app-list VOICE-AND-VIDEO
        action accept
        set color mpls        # Force MPLS for voice
      sequence 20
        match app-list NETFLIX
        action drop           # Block Netflix
    

4.3 SLA-Based Fallback

  • Combine with Performance Routing (PfR) to switch paths if SLA fails:
    match app-list WEBEX
    action accept
    set sla preferred-color mpls latency 100ms
    

5. Verification & Troubleshooting

5.1 Key Commands

Command Purpose
show sdwan app-aware stats Lists detected apps and paths.
show sdwan policy service-statistics Checks policy hits.
show sdwan app-fwd dpi flows Inspects DPI-classified flows.

5.2 Common Issues

Symptom Likely Cause Fix
App traffic not matching Incorrect app-list definition Verify app names in vManage.
Policy not applying Wrong policy priority Reorder policies (lower sequence = higher priority).
DPI not detecting apps Encryption (TLS 1.3) Use IP-based matching as fallback.

6. Advanced Use Cases

6.1 Custom DPI Signatures

  • For proprietary apps, add custom signatures:
    app-list CUSTOM-APP
      signature TCP port 5000 protocol HTTP user-agent "MyApp*"
    

6.2 Combining with QoS

  • Mark apps for prioritization:
    match app-list VOICE
    action accept
    set dscp ef           # Expedited Forwarding (VoIP)
    

6.3 Internet Breakout for Specific Apps

match app-list SALESFORCE
action accept
set vpn 10                    # Local breakout via VPN 10
set nat use-vpn 0             # Use VPN 0's NAT pool (if VPN 0 is internet-facing)

7. Summary Checklist

  • Define app lists in vManage (Configuration > Policies > App-Aware Routing).
  • Use match app-list in policies to steer traffic.
  • Test with show sdwan app-aware stats.
  • Combine with SLA for dynamic failover.

Key Takeaways

  1. match app-list enables application-aware routing (not just IP/port-based).
  2. DPI visibility can be affected by strong encryption (e.g., TLS 1.3 with ESNI) → May need fallback to IP-based matching.
  3. Policy order matters — Highest priority (lowest sequence) evaluates first.

Front-Door VRF (FVRF) Explained (Using Cisco Gear)

Front-Door VRF (FVRF) is a Cisco feature that enhances security by separating the management plane from the data plane in network devices (routers, switches, firewalls). It achieves this by placing the management interface (SSH, SNMP, HTTPS, etc.) in a separate Virtual Routing and Forwarding (VRF) instance, isolating it from the default global routing table.

Note: While this document describes the general concept of Front-Door VRF in Cisco devices, in Cisco SD-WAN (Viptela-based) architectures:

  • VPN 0 is often referred to as the "Front-Door VRF" in the sense that it is the transport VRF carrying all overlay control and data tunnel traffic, and often in-band management.
  • VPN 512 is used for isolated out-of-band management, conceptually similar to a traditional FVRF.

Why Use Front-Door VRF?

  1. Security: Prevents unauthorized access to management interfaces via data-plane attacks.
  2. Isolation: Ensures management traffic doesnt mix with production traffic.
  3. Multi-Tenancy: Useful in service provider environments where management traffic must be segregated per customer.
  4. Simplified Routing: Avoids route conflicts between management and data networks.

How FVRF Works

  • The management interface (e.g., Mgmt0/0) is assigned to a dedicated VRF (e.g., MGMT-VRF).
  • All management traffic (SSH, SNMP, etc.) must go through this VRF.
  • The data plane (regular traffic) uses the default global routing table or other service VRFs.

Configuration Example (Cisco IOS-XE / IOS)

1. Create the Management VRF

configure terminal
vrf definition MGMT-VRF
 rd 100:1  ! Route Distinguisher (for uniqueness)
 address-family ipv4
 exit-address-family
exit

2. Assign the Management Interface to the VRF

interface GigabitEthernet0/0
 description Management Interface
 vrf forwarding MGMT-VRF
 ip address 192.168.1.1 255.255.255.0
 no shutdown
exit

3. Configure a Default Route for Management Traffic

ip route vrf MGMT-VRF 0.0.0.0 0.0.0.0 192.168.1.254

(Where 192.168.1.254 is the gateway for management traffic.)

4. Enable VRF-Aware Services

ip http server
ip http vrf MGMT-VRF  ! Ensures HTTP/HTTPS uses the MGMT-VRF
line vty 0 4
 transport input ssh vrf-alias MGMT-VRF enable  ! Ensures SSH uses the MGMT-VRF
exit

Verification

  • Check VRF routing table:
    show ip route vrf MGMT-VRF
    
  • Verify interface assignment:
    show vrf brief
    
  • Test connectivity:
    ping vrf MGMT-VRF 192.168.1.254
    

Key Considerations

  • NTP & DNS: If management relies on NTP/DNS, ensure they are reachable via the FVRF.
  • Backup Access: Always maintain an alternative access method (console) in case of misconfiguration.
  • Compatibility: Some older Cisco devices may not support all VRF-aware services.

Conclusion

Front-Door VRF is a best practice for securing management traffic in Cisco environments. By isolating management interfaces in a separate VRF, you reduce attack surfaces and prevent unauthorized access through data-plane vulnerabilities.