Add work/den_job_prep.md

This commit is contained in:
2024-10-04 20:07:58 +00:00
parent b4b7d3010e
commit 6d26446cc4

452
work/den_job_prep.md Normal file
View File

@@ -0,0 +1,452 @@
1. ACI shifts the focus from network-centric to application-centric configurations:
- Traditional networking focuses on configuring individual network devices (switches, routers) and protocols.
- ACI instead focuses on the applications and their requirements, abstracting away much of the underlying network complexity.
- This shift allows network administrators to think in terms of application needs rather than network topology.
2. Network policies are defined based on application requirements:
- In ACI, you define what an application needs in terms of connectivity, security, and performance.
- These requirements are translated into network policies automatically.
- For example, you might specify that a web server needs to communicate with a database server on a specific port, and ACI will configure the necessary network settings.
3. Applications are grouped into "End Point Groups" (EPGs):
- An EPG is a logical grouping of endpoints that require similar network policies.
- Endpoints can be physical servers, virtual machines, containers, or even individual IP addresses.
- EPGs abstract away the physical and logical topology, focusing instead on the application function.
4. EPGs are collections of endpoints that share common policy requirements:
- All endpoints in an EPG are treated the same from a policy perspective.
- This simplifies policy management - instead of configuring policies for each individual endpoint, you configure them once for the EPG.
- For example, all web servers might be in one EPG, while all database servers are in another.
5. Contracts define how EPGs communicate with each other:
- Contracts are the ACI equivalent of Access Control Lists (ACLs) in traditional networking.
- They specify which EPGs can communicate with each other and how.
- Contracts can define allowed protocols, ports, and even quality of service (QoS) settings.
- They follow a provider-consumer model: one EPG provides a contract, and another EPG consumes it.
Example scenario:
Imagine a three-tier web application with web, application, and database layers. In ACI:
- You'd create three EPGs: Web-EPG, App-EPG, and DB-EPG.
- You'd then create contracts:
1. Web-to-App contract (allows HTTP/HTTPS traffic)
2. App-to-DB contract (allows specific database port traffic)
- The Web-EPG would consume the Web-to-App contract, and the App-EPG would provide it.
- The App-EPG would consume the App-to-DB contract, and the DB-EPG would provide it.
This approach allows for intuitive, application-focused network design and management, with built-in security and scalability.
---
# Cisco ACI: Network-Centric Guide
## 1. Physical Topology: Leaf-Spine Architecture
ACI uses a leaf-spine architecture:
- Leaf switches: Connect to end devices (servers, firewalls, load balancers)
- Spine switches: Interconnect all leaf switches
- Every leaf connects to every spine, creating a full mesh topology
Benefits:
- Predictable latency
- High bandwidth
- No spanning tree protocol needed
## 2. APIC (Application Policy Infrastructure Controller)
- Centralized management and control plane
- Cluster of 3 or more controllers for high availability
- Manages all aspects of the ACI fabric
## 3. Underlay Network: IS-IS and VXLAN
- IS-IS (Intermediate System to Intermediate System) routing protocol used internally
- VXLAN (Virtual Extensible LAN) for network virtualization
- Allows layer 2 segments to extend across the layer 3 fabric
- 24-bit VNID (VXLAN Network Identifier) for segment identification
## 4. Tenant Network Virtualization
- Tenants: Logical containers for policies, services, and network segments
- VRF (Virtual Routing and Forwarding): Provides IP address space isolation
- Bridge Domains: Layer 2 forwarding domains, similar to VLANs
- Subnets: IP address ranges associated with Bridge Domains
## 5. External Connectivity
- L3Out: Connects ACI fabric to external layer 3 networks
- Supports BGP, OSPF, EIGRP, and static routing
- L2Out: Connects ACI fabric to external layer 2 networks
## 6. Packet Flow
1. Ingress leaf switch performs VXLAN encapsulation
2. Spine switches route based on VXLAN outer header
3. Egress leaf switch performs VXLAN decapsulation
4. Policy enforcement occurs at ingress and egress leaf switches
## 7. Hardware Components
- Nexus 9000 series switches
- 9300 platform for leaf switches
- 9500 platform for spine switches
- APIC appliances or virtual machines
## 8. Key Protocols and Technologies
- LLDP (Link Layer Discovery Protocol): Neighbor discovery
- CDP (Cisco Discovery Protocol): Cisco-specific neighbor discovery
- COOP (Council of Oracle Protocol): Endpoint location distribution
- MP-BGP EVPN: For multi-site deployments
## 9. Multicast
- Uses a modified version of PIM BiDir (Bidirectional Protocol Independent Multicast)
- Optimized for the leaf-spine architecture
## 10. Quality of Service (QoS)
- Implemented through Custom Queuing Classes (CQC)
- Policies can be applied at various levels (EPG, contract, etc.)
Understanding these network-centric aspects is crucial for effectively designing, implementing, and troubleshooting an ACI fabric.
---
Heres an outline of all the topics weve discussed during our conversation:
---
### **1. Cisco Nexus APIC Controllers Overview**
- **Basic Concepts**: Introduction to **Cisco Nexus APIC (Application Policy Infrastructure Controller)** and its role in Cisco ACI.
- **APIC Architecture**: Overview of **tenants**, **EPGs (Endpoint Groups)**, and **contracts** in Cisco ACI.
- **Network Abstraction and Centralized Policy Management**: How APIC abstracts the network and applies policies across the fabric.
---
### **2. Endpoint Groups (EPGs)**
- **Definition and Purpose**: Logical grouping of endpoints (servers, VMs, containers) that share the same policies.
- **EPG Example**: Example of an EPG for a three-tier web application (Web, App, and Database EPGs).
- **Communication Between EPGs**: Using contracts to control traffic between EPGs and enforce policies.
---
### **3. Tenants in Cisco ACI**
- **Tenant Overview**: Explanation of tenants as logical containers that provide isolation between different network segments.
- **Types of Tenants**:
- **Common Tenant**: Shared services across the entire fabric.
- **Infrastructure Tenant**: Used for fabric-level configurations.
- **User Tenants**: Representing departments, applications, or business units.
- **Example of Tenant Usage**: Different departments with isolated network and security policies.
---
### **4. Contracts in Cisco ACI**
- **Purpose**: Contracts define rules for communication between EPGs.
- **Step-by-Step Guide for Creating Contracts**:
- How to create a contract, subjects, and filters.
- Attaching contracts to EPGs (providing and consuming contracts).
- **Example of Contract**: Setting up HTTP traffic between Web and App EPGs.
---
### **5. Monitoring Contracts**
- **APIC GUI Monitoring**: Monitoring contracts in the APIC GUI and tracking communication between EPGs.
- **CLI Monitoring**: Using CLI commands to check contract usage, traffic, and faults.
- **REST API for Monitoring**: Programmatically monitor contract stats using the ACI REST API.
- **SNMP and Syslog**: Configuring SNMP traps and Syslog for external monitoring and logging.
---
### **6. Viewing and Exporting Faults**
- **Viewing Faults**:
- APIC GUI: Viewing active and historical faults in the APIC interface.
- CLI: Checking fault details using CLI commands.
- REST API: Retrieving faults via API for automation and integration.
- **Exporting Fault Logs**: Steps to export fault logs to **CSV** or **JSON** formats for analysis and sharing.
- **Using Syslog and SNMP**: Sending faults to a Syslog server or SNMP traps for centralized monitoring.
---
### **7. Resolving Faults**
- **Identifying Faults**: How to analyze fault details like severity, cause, and affected object.
- **Common Faults and Resolutions**:
- **Interface Down or Flapping**: Troubleshooting physical and configuration issues.
- **VPC Peer-Link Issues**: Fixing peer-link failures and keepalive issues.
- **Contract Denied or Misconfigured**: Resolving issues with blocked traffic due to incorrect contracts.
- **Node Unreachable**: Rebooting or reconfiguring unreachable fabric nodes.
- **Configuration Out of Sync**: Re-syncing fabric configurations with the APIC.
---
### **8. Clearing Faults**
- **Clearing Faults via APIC GUI**: Acknowledge or clear faults from the APIC interface.
- **Clearing Faults via CLI**: Manually clearing faults using CLI commands.
- **REST API for Clearing Faults**: Using the API to programmatically clear faults.
- **Best Practices for Clearing Faults**: Ensure issues are resolved before clearing faults.
---
### **9. Major Faults in Cisco ACI**
- **Common Causes of Major Faults**:
- **Misconfigured Contracts and Filters**: Issues with denied traffic.
- **Interface or Port Issues**: Speed/duplex mismatches, down interfaces.
- **VPC Misconfiguration**: Peer-link or keepalive failures.
- **Misconfigured Fabric Policies**: Problems with access policies or QoS settings.
- **Node Resource Utilization**: High CPU or memory utilization.
- **Configuration Out of Sync**: Mismatch between APIC and fabric node configurations.
- **Reachability Issues**: APIC or fabric node connectivity problems.
- **Firmware Bugs**: Issues introduced by software bugs.
---
### **10. Updating Cisco ACI Firmware**
- **Step-by-Step Firmware Update Process**:
- **Pre-Upgrade**: Download firmware, back up configuration, verify compatibility.
- **Upgrade APIC Controllers**: Perform a rolling upgrade of APIC controllers.
- **Upgrade Fabric Nodes**: Upgrade leaf and spine switches, using ISSU to minimize downtime.
- **Post-Upgrade**: Verify versions, check health, and resolve faults.
---
### **11. Common Issues During ACI Firmware Upgrade**
- **Fabric Nodes Failing to Upgrade**: Causes include insufficient disk space, corrupted firmware, or connectivity issues.
- **APIC Cluster Quorum Loss**: Loss of connectivity or sync issues during the APIC upgrade.
- **VPC Inconsistencies**: VPC peer-link or configuration mismatches after the upgrade.
- **Connectivity Issues Post-Upgrade**: Traffic loss due to policy enforcement problems or stale ARP/MAC entries.
- **Node Reboot Loops**: Continuous reboot cycles caused by firmware or hardware failures.
---
### **12. Best Practices for Cisco ACI Firmware Upgrades**
- **Pre-Upgrade**:
- Validate compatibility and upgrade path.
- Back up the configuration and schedule a maintenance window.
- Test in a lab environment.
- **During the Upgrade**:
- Upgrade APIC controllers first.
- Use **ISSU** for Nexus switches.
- Monitor system health and logs.
- **Post-Upgrade**:
- Verify all nodes are upgraded.
- Monitor health and faults.
- Re-check connectivity and policies.
- Take a post-upgrade configuration backup.
---
Re-syncing nodes in Cisco ACI ensures that any configuration discrepancies between the **APIC controller** and the fabric nodes (leaf or spine switches) are corrected. A **re-sync** forces the APIC to re-push the configuration to a node to ensure that the fabric nodes are in sync with the intended policies, contracts, and other configurations.
Re-syncing nodes can be necessary when there are **configuration out-of-sync faults**, **node reachability issues**, or **after performing firmware upgrades** to ensure that all configurations have been applied properly.
Heres how to re-sync nodes in Cisco ACI using both the **APIC GUI** and **CLI**.
---
### **1. Re-sync Nodes via APIC GUI**
#### Step-by-Step Process:
1. **Log into the APIC GUI**:
- Open your browser and log into the **APIC** using your credentials.
2. **Navigate to the Fabric Membership**:
- On the left-hand menu, navigate to **Fabric** > **Inventory** > **Fabric Membership**.
- This page displays all the fabric nodes (both leaf and spine switches) and their status.
3. **Check for Out-of-Sync Nodes**:
- Look for any **faults** related to configuration out-of-sync issues.
- You may notice specific **out-of-sync faults** for nodes that need to be re-synchronized with the APIC.
4. **Select the Node to Re-sync**:
- In the **Fabric Membership** page, locate the node (leaf or spine) you want to re-sync.
- Right-click on the node or click on the node's options menu (three dots) next to the nodes name.
5. **Re-sync the Node**:
- Select **Re-sync Config** from the dropdown menu.
- This will force the APIC to re-apply the current configuration to the selected node.
6. **Monitor the Re-sync Process**:
- After initiating the re-sync, you can monitor the status of the process.
- Check for any faults or issues that may arise during the re-sync.
- Once complete, verify that the node is healthy and synchronized by checking its health score and ensuring that there are no out-of-sync configuration errors.
---
### **2. Re-sync Nodes via CLI**
#### Step-by-Step Process:
1. **Access the APIC CLI**:
- SSH into your APIC controller using the following command:
```bash
ssh admin@<APIC-IP>
```
2. **List the Fabric Nodes**:
- To see the current fabric nodes (leaf and spine switches) and their IDs, run:
```bash
show fabric membership
```
- This will list all the nodes in the fabric and their **Node ID**.
3. **Re-sync a Specific Node**:
- To re-sync a specific node, use the following command:
```bash
fabric re-sync node <node-id>
```
- Replace `<node-id>` with the actual ID of the node you want to re-sync, which you obtained from the previous step.
4. **Monitor the Re-sync Process**:
- After issuing the command, the APIC will push the configuration to the node and attempt to bring it in sync.
- Use the following command to monitor the node's synchronization status and check for faults:
```bash
show fault
```
5. **Verify Node Sync Status**:
- Once the re-sync is complete, ensure the node is healthy and that there are no out-of-sync errors by running:
```bash
show fabric membership
```
---
### **3. When to Re-sync Nodes?**
Re-syncing nodes is commonly needed in the following situations:
- **Configuration Out-of-Sync Faults**: When the configuration on the node and APIC doesnt match, re-syncing can resolve the mismatch.
- **Node Not Responding to Policy Changes**: If policy changes or new configurations are not reflected on a node, re-syncing forces the node to apply them.
- **After Firmware Upgrades**: After upgrading the firmware of APIC controllers or fabric nodes, re-syncing ensures that all configuration updates are properly applied.
- **After Node Reboots**: If a node was rebooted or brought back online, re-syncing ensures it fully re-joins the fabric with the correct configuration.
---
### **4. Best Practices for Re-syncing Nodes**
- **Monitor Health and Faults**: Always monitor the nodes health and check for **faults** both before and after the re-sync. This will help identify potential issues that could prevent the node from re-syncing properly.
- **Re-sync During Maintenance Windows**: If possible, perform re-syncs during a **maintenance window**, especially if you're working with critical nodes, to minimize the impact on network performance.
- **Avoid Re-syncing Multiple Nodes at Once**: Re-sync nodes one at a time to avoid overwhelming the APIC and fabric with multiple configuration pushes at once.
---
It sounds like you're ready to consolidate all the detailed information we've covered into a single, comprehensive artifact. Heres a suggested outline to summarize everything about **Cisco ACI firmware upgrades**, focusing on **best practices**, **common issues**, and the **step-by-step process**.
### **Artifact: Cisco ACI Firmware Upgrade Guide**
---
#### **1. Introduction**
- **Purpose**: This guide outlines the recommended practices and step-by-step instructions for upgrading the firmware of Cisco ACI, including APIC controllers and Nexus leaf/spine switches. It also covers common issues, troubleshooting tips, and how to ensure a smooth and successful upgrade.
- **Audience**: Network engineers, administrators, and IT professionals responsible for managing and upgrading Cisco ACI infrastructure.
---
#### **2. Pre-Upgrade Planning**
**Key Preparations Before Upgrading:**
- **Backup the Configuration**: Always back up the ACI fabric configuration before starting the upgrade. Navigate to **Admin > Import/Export > Config Export** in the APIC GUI.
- **Understand Compatibility**: Review the **ACI compatibility matrix** and release notes to ensure that the APIC and Nexus switches can be upgraded to the target version.
- **Review the Upgrade Path**: Ensure you're following the correct upgrade path, especially when moving between major versions. Some versions may require intermediate upgrades.
- **Check Disk Space**: Confirm that APIC controllers and Nexus switches have adequate disk space for the upgrade files using `show system internal flash` for switches.
- **Test in a Lab Environment**: If possible, simulate the upgrade in a test environment to identify potential issues.
- **Schedule a Maintenance Window**: Plan for downtime, notify stakeholders, and ensure that the upgrade is performed during a low-traffic period.
---
#### **3. Step-by-Step Upgrade Process**
**a. Download Firmware**:
- Download the firmware packages for **APIC controllers** and **Nexus switches (leaf and spine)** from the [Cisco Software Download Portal](https://software.cisco.com).
- Upload the firmware to the APIC by navigating to **Admin > Firmware > Firmware Repository**.
**b. APIC Controller Upgrade**:
1. Navigate to **Admin > Firmware > Infrastructure Firmware**.
2. Start a rolling upgrade by selecting **Upgrade Now** or scheduling the upgrade.
3. Upgrade the APICs one by one to maintain cluster quorum.
4. Monitor the upgrade process and verify the firmware version after each APIC has been upgraded.
**c. Nexus Leaf and Spine Switch Upgrade**:
1. Start by upgrading **spine nodes** first, then **leaf nodes**.
2. Use **In-Service Software Upgrade (ISSU)** where possible to minimize downtime.
3. Monitor the upgrade progress in **Admin > Firmware > Infrastructure Firmware**.
4. Verify that all fabric nodes are running the correct firmware version after the upgrade using `show version`.
---
#### **4. Post-Upgrade Actions**
**a. Verify the Firmware Versions**:
- Use the APIC GUI or `show version` on switches to ensure all nodes are running the correct firmware.
**b. Health Checks**:
- Monitor the overall health of the fabric in **Fabric > Fabric Membership**.
- Check for new **faults** under **Monitoring > Faults** and resolve any major or critical issues.
**c. Policy and Connectivity Validation**:
- Test critical applications and network policies to ensure EPGs and contracts are working as expected. Use connectivity tests (e.g., ping, traceroute) between endpoints.
**d. Post-Upgrade Backup**:
- After verifying the upgrade, create a new backup of the ACI configuration using **Admin > Import/Export > Config Export**.
---
#### **5. Common Upgrade Issues and Resolutions**
**a. Fabric Nodes Failing to Upgrade**:
- **Symptoms**: Leaf or spine switches remain on the old firmware version.
- **Resolution**: Check for insufficient disk space or upload the firmware again. Ensure the correct upgrade path is followed.
**b. APIC Cluster Quorum Loss**:
- **Symptoms**: One or more APIC controllers fail to rejoin the cluster.
- **Resolution**: Ensure that out-of-band management is properly configured. Reboot APICs or re-sync the database.
**c. VPC Inconsistencies**:
- **Symptoms**: Virtual Port Channels stop functioning after the upgrade.
- **Resolution**: Review the VPC configuration and ensure peer links are up. Re-apply or reconfigure VPC settings if necessary.
**d. Connectivity Issues**:
- **Symptoms**: Endpoints lose connectivity after the upgrade.
- **Resolution**: Check for stale ARP/MAC entries, clear them if necessary, and verify contract and policy enforcement.
**e. Fabric Node Reboot Loops**:
- **Symptoms**: Nodes repeatedly reboot after the upgrade.
- **Resolution**: Reload firmware manually or replace faulty hardware if needed.
---
#### **6. Best Practices for ACI Firmware Upgrades**
**a. Upgrade APIC Controllers First**: Always upgrade the APICs before fabric nodes, maintaining cluster quorum.
**b. Use ISSU**: When upgrading Nexus switches, use **In-Service Software Upgrade** to minimize disruption.
**c. Upgrade in Phases**: For large environments, upgrade nodes in small batches.
**d. Monitor System Health**: Continuously monitor the health of the system during and after the upgrade, watching for critical faults or performance degradation.
**e. Review Release Notes and Known Bugs**: Stay informed about potential issues with the firmware version by reviewing Cisco's release notes and bug tracker.
---
#### **7. Conclusion**
- **Summary**: Upgrading Cisco ACI firmware is essential for ensuring a secure and stable infrastructure. Following best practices and carefully monitoring the process helps mitigate risks, reduce downtime, and maintain network continuity.
- **Further Support**: Always consult Ciscos technical documentation and reach out to Cisco TAC for assistance if any issues arise during the upgrade process.
---
### **Appendices**
- **ACI Firmware Compatibility Matrix**: (Insert link or reference to the Cisco matrix)
- **Useful CLI Commands**:
- Check current firmware version:
```bash
show version
```
- Verify cluster health:
```bash
acidiag health
```
- Check disk space on switches:
```bash
show system internal flash
```
- Re-sync fabric configuration:
```bash
fabric re-sync node <node-id>
```
---