Update tech_docs/networking/sdwan_extras.md
This commit is contained in:
@@ -1,3 +1,193 @@
|
||||
Here's the reorganized and improved document, focusing on clarity, flow, and impact. I've used clear headings, bullet points, and bold text for emphasis.
|
||||
|
||||
---
|
||||
|
||||
## Cisco SD-WAN: A Next-Generation VPN Architecture
|
||||
|
||||
This document outlines the limitations of traditional VPN architectures and presents Cisco SD-WAN as a modern solution, highlighting its key features, architectural shifts, components, deployment models, and traffic engineering capabilities.
|
||||
|
||||
### 1. Challenges with Current VPN Architectures
|
||||
|
||||
Traditional VPN solutions, primarily **point-to-point VPNs, DMVPN, and GETVPN**, while functional, present significant operational challenges:
|
||||
|
||||
* **Manual and Time-Consuming Configurations:** Extensive manual configuration is required on each device, leading to slow deployments and increased potential for human error.
|
||||
* **Lack of Integrated Automation:** Automation, if present, is typically an afterthought ("bolt-on") rather than an intrinsic part of the solution.
|
||||
* **Cumbersome Policy Deployment:** Implementing and managing network policies is difficult, requiring individual deployment on every network node.
|
||||
* **Difficulty with VRF Stretching:** Extending Layer 3 segmentation (VRFs) across the WAN is complex, especially for multiple VRFs like employee, guest, or IoT.
|
||||
* **Key Distribution Inefficiency (GETVPN):** Despite GETVPN's aim to improve key distribution, its adoption was limited, and many solutions still rely on IKE for IPsec tunnel setup.
|
||||
|
||||
### 2. Desired Features for a Next-Generation Architecture
|
||||
|
||||
A next-generation VPN architecture should prioritize the following capabilities:
|
||||
|
||||
* **Integrated Automation:** Automation must be a fundamental, built-in component.
|
||||
* **Open APIs:** Support for open APIs is essential to facilitate broader enterprise-wide automation, extending beyond just network automation.
|
||||
* **Enhanced Scalability:** The architecture must support a significantly larger number of devices and connections.
|
||||
* **Robust Policy Management:** More sophisticated, flexible, and centralized policy enforcement capabilities are crucial.
|
||||
* **Abstracted Configuration:** The ability to configure and manage the network based on desired outcomes (e.g., "prefer this traffic over that") rather than granular, platform-specific CLI commands, abstracting away code version and platform differences.
|
||||
|
||||
### 3. Key Architectural Shifts in Cisco SD-WAN
|
||||
|
||||
Cisco SD-WAN is built upon two fundamental architectural shifts:
|
||||
|
||||
* **Separation of Control and Data Plane:**
|
||||
* This is a core paradigm shift that centralizes control plane functions (e.g., key exchange, routing information, reachability, VPN membership).
|
||||
* The data plane, conversely, is streamlined to focus solely on forwarding encrypted packets.
|
||||
* This centralization significantly enhances scalability and simplifies network management, similar in concept to BGP route reflectors but more comprehensive.
|
||||
* **Ubiquitous IP-based Transport with Tagging:**
|
||||
* Leveraging lessons from MPLS, the new architecture uses ubiquitous IP (IPv4/IPv6) as the underlying transport.
|
||||
* Instead of MPLS frames, the solution encrypts the inner payload, includes tagging within this payload, and encapsulates it in a new IP packet. This allows it to seamlessly traverse any IP-based underlay network (e.g., Internet, MPLS).
|
||||
|
||||
### 4. Cisco SD-WAN Terminology and Components
|
||||
|
||||
#### 4.1. Terminology:
|
||||
|
||||
* **Transport Side (VPN 0):** The interface on WAN Edge devices and controllers connecting to the underlying transport network (Internet, MPLS). This is equivalent to the global routing table.
|
||||
* **Service Side VPNs (VPN 1-511, 513-65536):** User-defined VPNs, analogous to VRFs, used for different services (e.g., employee, guest, IoT). VPN 512 is reserved for out-of-band management.
|
||||
* **T-lock (Transport Locator):** Identifies a device within the overlay. It includes attributes such as system IP, encapsulation type (IPsec/GRE), encryption key, and "color" (distinguishes public/private transport links).
|
||||
* **Private T-lock:** IP address and port before NAT.
|
||||
* **Public T-lock:** Outside of NAT interface or routable IP.
|
||||
* **Overlay Routing (Service-Side Routing):** Routes learned on the service side that are then distributed across the SD-WAN overlay.
|
||||
* **OMP (Overlay Management Protocol):** A dynamic, extensible management protocol responsible for distributing overlay routing information, data plane encryption keys, and centralized data policies.
|
||||
* **Site ID:** A 32-bit integer uniquely identifying a site or location within the overlay, extensively used in policy definitions.
|
||||
* **System IP:** An IPv4 address (not necessarily routable) that logically identifies a WAN Edge router within the overlay, typically configured on the VPN 0 loopback interface.
|
||||
* **Organizational Name:** A unique identifier for the entire SD-WAN overlay domain, used for authentication.
|
||||
|
||||
#### 4.2. Components:
|
||||
|
||||
The Cisco SD-WAN solution comprises controller elements and WAN Edge routers:
|
||||
|
||||
* **Cisco SD-WAN Controller Elements:** These are virtual machines deployable on-prem or in the cloud.
|
||||
* **vManage NMS:** The management plane. It handles configuration (via Netconf), telemetry collection, and API integration. It supports Role-Based Access Control (RBAC) and SAML SSO.
|
||||
* **vSmart Controller:** The control plane. It distributes overlay routing, data plane security keys, and data policies using OMP. It is responsible for implementing control plane policies.
|
||||
* **vBond Orchestrator:** The orchestration plane. It acts as the initial point of authentication (PKI), orchestrates connectivity between WAN Edges and other controllers, and functions as a STUN server for NAT traversal.
|
||||
* **WAN Edge Routers (Data Endpoints):** These are the data plane devices.
|
||||
* Available as physical appliances (ISR 1K/4K, ASR 1000, Catalyst 8000 series) or virtual instances (CSRv, Catalyst 8000V).
|
||||
* Automatically establish full-mesh IPsec tunnels based on control plane information received from vSmart.
|
||||
* Implement data plane policies and export performance statistics to vManage.
|
||||
* Support robust security features, including control plane policing and selective inbound connection acceptance (e.g., DTLS/TLS from authenticated sources, SD-WAN IPsec/GRE from trusted WAN Edges, third-party IPsec/GRE, integration with cloud security services like Cisco Umbrella).
|
||||
|
||||
### 5. Cisco SD-WAN Deployment and Redundancy
|
||||
|
||||
#### 5.1. Deployment Models:
|
||||
|
||||
* **Controller Deployment:**
|
||||
* **Cisco Hosted:** Cisco manages the controllers; customers retain full administrative control.
|
||||
* **MSP Hosted:** A Managed Service Provider hosts the controllers, potentially with shared visibility.
|
||||
* **Do-It-Yourself:** Customers deploy controllers on-premise or in a private cloud, maintaining full infrastructure and administrative control.
|
||||
* **WAN Edge Deployment:**
|
||||
* **Transport Side (VPN 0):** Connects to the underlay transport via physical or logical interfaces. Uses "color" to identify WAN attachment points (T-lock). Supports static routing, BGP, and OSPF for underlay routing.
|
||||
* **Out-of-Band Management VPN (VPN 512):** A dedicated routing domain for management traffic, with prefixes not carried across the overlay.
|
||||
* **Service Side VPNs:** Learns and distributes LAN-side routing information via OMP. Supports connected interfaces, static routing, BGP, OSPF, and EIGRP.
|
||||
|
||||
#### 5.2. Redundancy and High Availability:
|
||||
|
||||
Cisco SD-WAN provides comprehensive redundancy at various levels:
|
||||
|
||||
* **WAN Edge Device Redundancy:** Multiple WAN Edges at a single location can use Layer 2 (VRRP) or Layer 3 (BGP, OSPF, EIGRP) protocols for first-hop redundancy.
|
||||
* **Transport Redundancy:** Supports up to eight active-active transport interfaces, allowing for load sharing based on session or weighted session, application pinning for logical topologies (active/standby), and application-aware routing for performance-based traffic steering with SLAs.
|
||||
* **Transport Connectivity Models:**
|
||||
* **Full Mesh Transport:** Recommended for data centers or hub sites.
|
||||
* **T-lock Extension:** Allows extending transport from one WAN Edge to another, useful for branches where a full mesh is not feasible.
|
||||
* **Controller Redundancy:**
|
||||
* Multiple vSmart Controllers can be deployed for failover.
|
||||
* **vManage Scale:** Up to 2,000 devices per node, clusterable up to six nodes.
|
||||
* **vSmart Scale:** Up to 5,400 concurrent connections, supporting up to 20 vSmart controllers per overlay.
|
||||
* **vBond Scale:** Up to 1,500 concurrent connections, supporting up to eight vBond orchestrators per overlay.
|
||||
|
||||
#### 5.3. Control Plane Connectivity:
|
||||
|
||||
* **WAN Edge to vBond:** A transient DTLS connection is established for initial authentication and orchestration.
|
||||
* **WAN Edge to vManage:** A single permanent connection per WAN Edge for configuration (Netconf) and telemetry.
|
||||
* **WAN Edge to vSmart:** One permanent OMP connection per vSmart per transport (e.g., two transports and two vSmarts would result in four connections).
|
||||
* Controllers (vManage, vSmart, vBond) maintain full mesh control connections with each other.
|
||||
|
||||
### 6. Cisco SD-WAN Overlay Bring-Up Process
|
||||
|
||||
The automated bring-up process for the SD-WAN overlay involves the following steps:
|
||||
|
||||
1. **Initial Connection:** The WAN Edge establishes a temporary DTLS connection to the vBond orchestrator for authentication and initial coordination.
|
||||
2. **Permanent Control Connections:** After successful authentication, permanent DTLS/TLS connections are established:
|
||||
* To vManage for ongoing configuration and telemetry exchange.
|
||||
* To vSmart for receiving control plane information (routing, data plane security keys, and policy).
|
||||
3. **Data Plane Tunnel Establishment:** Using the information received from vSmart, WAN Edges automatically establish a full mesh of IPsec tunnels for data forwarding. This design ensures strict separation between the control and data planes, preventing data traffic from inadvertently "leaking" into the control plane.
|
||||
4. **Logical Topologies:** Centralized policies can then be applied to create specific logical topologies, such as partial mesh or hub-and-spoke.
|
||||
|
||||
### 7. Cisco SD-WAN Hardware and Software
|
||||
|
||||
#### 7.1. Hardware Platforms:
|
||||
|
||||
Cisco offers a diverse range of SD-WAN platforms tailored for various deployment scenarios:
|
||||
|
||||
* **Branches/Small Office/Home Office (SOHO):** ISR 1000 series, ISR 4000 series.
|
||||
* **Aggregation Points (Data Centers/Hub Sites):** ASR 1000 series, Catalyst 8000 series.
|
||||
* **Cloud Service Providers (Virtual Form Factor):** CSRv, Catalyst 8000V.
|
||||
|
||||
Cisco continues to evolve its SD-WAN platform, offering purpose-built **Catalyst 8200 and 8300 series** for branch deployments and the **Catalyst 8500 series** for aggregation points. For cloud environments, the **Catalyst 8000V** provides virtualized functionality. While legacy Viptela vEdge devices are still supported, they are being phased out. For virtualized deployments, Cisco also offers platforms like the **ENCS** and **CSP 5000**.
|
||||
|
||||
#### 7.2. Software Evolution:
|
||||
|
||||
A significant software change occurred with **release 17.2/20.1**, where the traditional IOS XE and IOS XE SD-WAN images were merged into a **single universal image**. This universal image can operate in either **autonomous (traditional CLI)** mode or **controller (SD-WAN)** mode.
|
||||
|
||||
Furthermore, with **release 17.3/20.2**, the version numbering was synchronized, meaning **17.x** releases now correspond directly with **20.x** controller releases (e.g., 17.10/20.10). Cisco typically releases three images per year (around March/April, July/August, and November/December).
|
||||
|
||||
### 8. Cisco Validated Framework Lab and Topology
|
||||
|
||||
Cisco's validated framework team maintains a robust, real-world production lab environment for validating SD-WAN and SASE use cases. This lab uses **real equipment and production-shipping software**, providing a comprehensive testbed for various features and integrations.
|
||||
|
||||
#### 8.1. Lab Resources:
|
||||
|
||||
* A **knowledge article** (link to be provided) offers detailed information, including:
|
||||
* A **site table** with site IDs, IPs, names, and descriptions of site types and topologies.
|
||||
* A downloadable **PDF of the main master topology diagram**.
|
||||
* It is highly recommended to have these resources available for future sessions.
|
||||
|
||||
#### 8.2. Conceptual Lab Diagram Overview:
|
||||
|
||||
* The lab features **six sites**:
|
||||
* **Two Main Sites (New York City and Newark, NJ - Site IDs 100 and 200):** Representing large data centers and campuses, these are "well-connected sites" with multiple redundant WAN Edges connected to all transports. They include dedicated internet connections for local campus users. A Layer 3 TLS link connects these main sites outside the WAN overlay, facilitating interesting routing scenarios (each advertising local networks, a default route to the internet, and a backup route to the other main site). The second octet of the IP address (e.g., 10.100.x.x, 10.200.x.x) directly corresponds to the site ID for easy identification. While BGP runs on the LAN side, the "magic" of reachability and crypto keying is primarily handled by **OMP (Overlay Management Protocol)** in the overlay.
|
||||
* **Four Branch Sites (Chicago, San Diego, Boston, Philadelphia - Site IDs 400, 500, 600, 700):** Configured with slightly different topologies and a mix of ISR 1K and 4K hardware (with Catalyst 8K devices planned). The hardware type is less critical for functionality beyond interface count, throughput, and scale, as the vManage UI abstracts individual configurations.
|
||||
* **Transports:** The lab utilizes **real internet connectivity with routable IPs** and **MPLS**, enabling:
|
||||
* **Direct Internet Access (DIA):** Branches can directly access the internet, optionally sending traffic to cloud security providers like Cisco Umbrella for full Secure Internet Gateway (SIG) capabilities.
|
||||
* **Cloud Service Provider Connectivity:** Evaluation of connections to AWS, Azure, GCP, and middle-mile providers (e.g., Megaport, Equinix) for SDCI (Software-Defined Cloud Interconnect).
|
||||
* **Advanced SaaS Functionality:** Cloud OnRamp for SaaS dynamically routes application traffic based on real-time link performance.
|
||||
|
||||
#### 8.3. Detailed Lab Diagram (Visio - Overview):
|
||||
|
||||
* **Controllers:** Deployed in a hypervisor environment (VMware) but topologically configured as cloud-based with publicly reachable IPs. Includes one vBond orchestrator, one vManage (for configuration and telemetry), and vSmarts (the "brain" for learning and redistributing reachability and crypto keys). WAN Edges establish lightweight DTLS control plane sessions to vSmarts (e.g., four sessions for a dual-transport, dual-vSmart setup) to exchange information, allowing WAN Edges to establish direct UDP/ESP data plane tunnels to each other.
|
||||
* **Boston/Philadelphia Branch Example:** A single router, dual transport topology (ISR 4K), featuring a single backend interface connected to Catalyst 9300 switches configured as a Q-tag trunk. This breaks out into logical Q-tag sub-interfaces for multiple service-side VPNs (e.g., Guest in green, Employee).
|
||||
* **Dual Router, Single Transport Site Example:** Illustrates two routers, each connected to one transport, providing diversity and high availability. It includes **T-lock extension** technology, enabling WAN Edges to act as if they are connected to both transports despite only having two physical connections. It also shows a Layer 2 LAN side with two service-side VPNs, utilizing **VRRP** for high availability on the WAN Edge's Layer 3 IP address acting as the default gateway.
|
||||
* **Other Capabilities:** The lab also evaluates deployments in AWS, Azure, and GCP, as well as legacy site integration (e.g., migrating a DMVPN site to SD-WAN).
|
||||
|
||||
### 9. Traffic Engineering and Load Balancing in SD-WAN
|
||||
|
||||
Cisco SD-WAN offers an integrated and automated approach to traffic steering, significantly simplifying complex traditional methods:
|
||||
|
||||
* **Organic Load Balancing:** By default, the system automatically load balances and leverages all viable links to a destination.
|
||||
* **BFD Probes:** BFD probes are automatically spun up within data plane sessions to continuously monitor link viability and performance metrics (loss, latency, jitter).
|
||||
* **Session-Level Load Distribution:** Traffic is distributed across available links at the session level, similar to EtherChannel distribution.
|
||||
* **Centralized Policy for Sophisticated Steering:**
|
||||
* **Application-Aware Routing:** Define specific SLAs (loss, latency, jitter) for applications. Traffic is then dynamically steered to links that meet these SLAs, with configurable fallback options if a link degrades or fails.
|
||||
* **Application Pinning:** Specific applications can be "pinned" to a preferred link or set of links.
|
||||
* **Abstracted Configuration:** All traffic engineering is configured via **centralized policies in the vManage UI**, eliminating the need for complex CLI commands. The system intelligently renders the correct configuration based on the platform type and code version.
|
||||
|
||||
### 10. Encapsulation and Routing Protocols
|
||||
|
||||
#### 10.1. Encapsulation Protocols:
|
||||
|
||||
* **GRE Encapsulation:** Supported but not widely used. Suitable for private WANs where security is less critical and avoiding IPsec MTU overhead is a priority. GRE and IPsec cannot be mixed on the same transport.
|
||||
* **IPsec Encapsulation:** The default and recommended encapsulation for secure communication over untrusted transports like the Internet. The system automatically builds full-mesh IPsec tunnels and efficiently handles key distribution without relying on IKEv2, as keying information is learned and redistributed by the vSmarts as part of reachability information.
|
||||
* **vBond and vEdge Image:** The vBond orchestrator shares the same software image as the vEdge Cloud router; its specific function is determined by its bootstrap configuration.
|
||||
|
||||
#### 10.2. Routing Protocols:
|
||||
|
||||
* **Underlay Transport (VPN 0):** Supports BGP and OSPF. EIGRP is not supported here as it is a Cisco proprietary protocol and typically not used by service providers for underlay networks.
|
||||
* **Service Side (LAN-Connected Interfaces):** Supports BGP, OSPF, and EIGRP for LAN-side routing.
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
### **Deep Dive: The vBond Orchestrator in Cisco SD-WAN**
|
||||
The **vBond** is the **gatekeeper** and **orchestration brain** of Cisco SD-WAN (Viptela). It’s often misunderstood as "just another controller," but its role is critical for:
|
||||
1. **Initial authentication** (who gets into the overlay).
|
||||
|
||||
Reference in New Issue
Block a user