Add tech_docs/networking/network_iac_framework.md

This commit is contained in:
2025-08-02 10:34:47 -05:00
parent e02ba4dd2c
commit deabd0d296

View File

@@ -0,0 +1,519 @@
Heres the distilled meta-framework for designing a configuration management system that balances pragmatism with pride-of-ownership:
### **The Pillars of Sustainable Network Configuration Design**
#### **1. The Iron Triangle of Configuration Systems**
```mermaid
graph TD
A[Human Understandability] --> B[Automation Compatibility]
B --> C[Audit Compliance]
C --> A
```
**Key Insight**: Optimize for the center where all three overlap.
#### **2. Decision Filters for Every Component**
Ask of each element:
1. **"Will this still make sense in 5 years?"**
(Avoid tools/languages with steep lifecycle curves)
2. **"Does this reduce cognitive load?"**
(Favor structures that document themselves)
3. **"Can we exit this gracefully?"**
(No dead-end dependencies)
#### **3. The Template Hierarchy of Needs**
```text
[ Reliability ]
/ \
[Consistency] [Reproducibility]
\ /
[Documentation]
```
**Implementation Rule**: Satisfy each layer before moving up.
#### **4. Interface-Oriented Design**
Define clear boundaries between:
- **Data** (Variables/Secrets)
- **Templates** (Rendering Rules)
- **Delivery** (CLI/API/SSH)
- **Validation** (Pre/Post Checks)
#### **5. The Versioning Covenant**
```text
v1.0.0
│ │ └─ Patch (Template fixes)
│ └── Minor (New profiles)
└──── Major (Structure changes)
```
**Golden Rule**: Never break backward template compatibility.
#### **6. The Compliance Sandwich**
```python
def deploy(config):
pre_checks(config) # Template validations
apply(config) # Actual deployment
post_checks(config) # Agent verification
```
#### **7. The Escalation Ladder**
Design for progressive enhancement:
1. **Manual Stage**: `cat template.txt > device`
2. **Assisted Stage**: `./validate.py | ssh device`
3. **Automated Stage**: CI/CD pipeline
**Critical Feature**: Each stage produces human-readable artifacts.
#### **8. The Pride Metrics**
Measure system quality by:
- **MTTD** (Mean Time To Document) - How long to explain any config
- **RFR** (Ready-for-Rollback) - Seconds to revert any change
- **CIA** (Config-Impact-Awareness) - % of team who can trace configs to requirements
### **Practical Implementation Checklist**
1. **Start Here**:
```bash
mkdir config_system
touch config_system/design_principles.md # Document your meta-rules first
```
2. **First Templates**:
```text
switches/
├── base/
│ └── core.txt # Immutable base configs
└── overlays/
├── finance/ # Compliance additions
└── branch/ # Site-specific tweaks
```
3. **Validation Hook**:
```python
# hooks/validate.py
def main():
check(files_changed <= 3) # Prevent bulk undocumented changes
check(has_comments(config))
check(no_secrets_in_clear())
```
4. **The "Proud" Test**:
Ask quarterly: "Would we open-source this system today?"
(If not, refine until the answer is yes)
### **When to Stop Designing**
The system is "complete" when:
- New engineers can safely make config changes in their first week
- Any config can be traced to a business requirement in ≤3 clicks
- 95% of changes use existing patterns (not one-offs)
This framework gives you a foundation thats:
- **Engineer-Respectful**: Values human time over tool whims
- **Future-Proof**: Ready but not dependent on automation
- **Institutionally Durable**: Survives team turnover
Would you like to pressure-test this against your specific organizational constraints?
---
Here's a pragmatic hybrid approach that blends simplicity with modern requirements:
### **The Hybrid Configuration Framework**
#### **1. Core Architecture**
```mermaid
graph TD
A[Human-Readable Templates] --> B[Git-Versioned Sources]
B --> C[Agent Validation]
B --> D[Manual CLI Deployment]
B --> E[Automated Deployment]
```
#### **2. Implementation Layers**
**Layer 1: Static Templates (Foundation)**
- Keep your existing template structure
- Add validation markers for agents:
```text
! COMPLIANCE-TAG: PCI-DSS-4.1
aaa authentication login default group tacacs+ local
! VALIDATION: show aaa sessions
```
**Layer 2: Agent Hooks**
```python
# Basic validation agent (example)
def validate_config(config):
required_tags = ["PCI-DSS", "NIST-800"]
for line in config.split('\n'):
if "! COMPLIANCE-TAG:" in line:
tag = line.split(":")[1].strip()
if tag not in scanned_standards:
alert(f"Missing scan for {tag}")
```
**Layer 3: Deployment Bridge**
```bash
# Sample CI pipeline (runs on config changes)
1. jinja2 render --validate-tags
2. agent validate --level PCI
3. if pass: scp config to device
4. if fail: create JIRA ticket
```
#### **3. File Structure Upgrade**
```text
configs/
├── templates/ # Your existing templates
├── agents/
│ ├── validator.py # Lightweight compliance checks
│ └── tags.db # Required compliance markers
└── hooks/
├── pre-deploy.sh # Signature verification
└── post-deploy.sh # Compliance snapshot
```
#### **4. Workflow Integration**
**Manual Mode:**
```bash
# Engineer workflow
vim templates/BASE/02_system.txt
./hooks/pre-deploy.sh # Verifies templates
deploy manually
```
**Automated Mode:**
```yaml
# GitLab CI example
deploy_config:
only:
- /^v\d+\.\d+/ # Tagged releases
script:
- ./agents/validator.py -t PCI
- ansible-playbook deploy.yml
```
#### **5. Compliance Bridge**
**Template Markers → Agent Food**
```text
! COMPLIANCE-REQ: Ensure TACACS timeout < 10s
tacacs-server timeout 9
```
Validated by:
```python
# Agent checks
if "tacacs-server timeout" in config:
timeout = int(config.split()[-1])
assert timeout <= 10, "TACACS timeout violation"
```
#### **6. Key Advantages**
1. **Progressive Compliance**:
- Start with manual template reviews
- Gradually automate checks
2. **Tool Agnostic**:
- Validators work with Ansible, Terraform, or manual deploys
3. **Audit Trail**:
```text
2023-11-25 14:22: [AGENT] PASSED PCI-4.1 (template: 02_system.txt)
```
#### **7. Reality Checks**
**When This Fits:**
- Teams with 1-2 engineers managing 50-300 devices
- Mixed vendor environments
- Gradually maturing compliance programs
**When to Avoid:**
- Fully automated cloud-native networks
- Teams already using Terraform Enterprise/SDN
#### **8. Starter Kit**
```bash
# Initial setup
git clone https://example.com/network-templates
cd network-templates
pip install -r agents/requirements.txt # Only dependency: PyYAML
# First validation
./agents/validator.py -t NIST-800-53
```
This gives you:
- Immediate compliance visibility
- Zero automation lock-in
- Clear path from templates to full IaC
- All while keeping your original template system intact
Would you like me to develop a specific component further? Particularly:
1. The agent validation logic
2. Git hook integration
3. Compliance tagging standards
---
Understood. Here's a lean, focused `README.md` that strictly covers template management without venturing into LCM or automation:
```markdown
# Cisco 4500 Series Configuration Templates
## Purpose
Standardized configuration templates for:
- Cisco 4507R+E
- Cisco 4510R+E
(Layer 3 Access Switch role)
## File Structure
```text
templates/
├── BASE/ # Core building blocks
│ ├── 01_license.txt # License/config-register
│ ├── 02_system.txt # Hostname/services/NTP
│ ├── 03_aaa.txt # Authentication
│ └── 04_vlans.txt # VLAN definitions
├── PORT_PROFILES/ # Interface templates
│ ├── access_data.txt # Data+Voice ports
│ ├── access_voice.txt # Voice-only ports
│ └── trunk_uplink.txt # Uplink ports
└── POLICIES/ # Reusable policy blocks
├── qos_marking.txt # QoS class-maps
└── acl_preauth.txt # Port ACLs
```
## Usage Instructions
### 1. Build a Configuration
```bash
# Combine templates in order
cat BASE/01_license.txt BASE/02_system.txt > switch_config.txt
# Apply port profiles (example)
cat PORT_PROFILES/access_data.txt | sed 's/{{ VLAN_ID }}/210/' >> switch_config.txt
```
### 2. Required Customizations
Each template contains explicit replacement markers:
```text
! REQUIRED CUSTOMIZATIONS (search for these):
{{ HOSTNAME }} # Device hostname
{{ MGMT_IP }} # Management interface IP
{{ VLAN_2xx }} # Data VLAN ID (2xx range)
```
### 3. Validation Checklist
After customization, verify:
1. No remaining `{{ }}` placeholders
2. VLAN IDs match port assignments
3. Unique IPs for all interfaces
## Template Versioning
| Version | Date | Changes |
|---------|------------|------------------------------|
| v1.0 | 2023-11-20 | Initial release |
| v1.1 | 2023-11-25 | Updated RADIUS server list |
## Contribution Rules
1. **Modify templates only via numbered changes**:
```bash
cp BASE/02_system.txt BASE/02_system_v1.1.txt
```
2. **Document changes in headers**:
```text
! HISTORY:
! 2023-11-25 - Updated NTP servers (jsmith)
! 2023-11-20 - Initial version (network-team)
```
## Policy Exceptions
For non-standard configurations:
1. Create new profile in `PORT_PROFILES/custom_*.txt`
2. Add exception note:
```text
! EXCEPTION: IDF-SW23 requires special QoS
! Approved by: jdoe (2023-11-15)
! Ticket: NET-1245
```
## Best Practices
1. **Search before creating** - Reuse existing profiles when possible
2. **Keep customizations visible** - Never delete placeholder markers
3. **Test incrementally** - Apply changes in sections
```
This version:
1. **Eliminates all automation/LCM references** - Pure configuration templates
2. **Focuses on manual operations** - `cat` and `sed` as primary tools
3. **Maintains auditability** - Clear versioning and change tracking
4. **Prevents scope creep** - No CI/CD, Ansible, or deployment processes
The template system stands alone as a documentation and standardization tool, decoupled from any specific deployment methodology.
---
Here's a comprehensive `README.md` for your configuration template system:
```markdown
# Network Configuration Template System
## Overview
Modular, version-controlled templates for Cisco 4507R+E/4510R+E switch configurations. Designed for both manual operations and automation readiness.
## Directory Structure
```text
config_components/
├── core_settings/ # Fundamental device configuration
│ ├── 00_licensing.j2
│ ├── 10_system_settings.j2
│ └── 20_aaa.j2
├── network_services/ # Network-wide services
│ ├── 30_vlans.j2
│ └── 40_routing.j2
├── interfaces/ # Interface configurations
│ ├── 50_port_profiles/
│ │ ├── access_port.j2
│ │ └── trunk_port.j2
│ └── 60_interface_assignments.j2
└── policies/ # Policy definitions
├── 70_qos.j2
└── 80_access_lists.j2
```
## Key Features
- **Numbered Load Order**: Files processed sequentially (00_ → 90_)
- **Port Profile System**: Reusable interface configurations
- **Policy/Service Separation**: Clean abstraction boundaries
- **Version Embedded**: Each template contains its own changelog
## Usage Guide
### 1. Preparation
```bash
# Clone repository
git clone https://example.com/network-templates.git
cd network-templates
# Create site variables file
cp examples/site_vars.yml site/nyc_floor3.yml
```
### 2. Configuration Generation
#### Manual Method (Quick Start)
```bash
# Using sed for simple replacements
sed "s/<HOSTNAME>/SWITCH-01/g" base_config.j2 > output.txt
# Using Jinja2 CLI (requires Python)
pip install jinja2-cli
jinja2 base_config.j2 site/nyc_floor3.yml --format=yaml > deployed_config.txt
```
#### Automated Method (Ansible)
```yaml
- name: Deploy switch config
hosts: switches
tasks:
- template:
src: "{{ role_path }}/templates/base_config.j2"
dest: "/tmp/{{ inventory_hostname }}.cfg"
```
### 3. Deployment
```bash
# Manual deployment
ssh admin@switch < deployed_config.txt
# Automated validation
ansible-playbook validate.yml -e @site/nyc_floor3.yml
```
## Template Development
### Adding New Components
1. Create numbered template file:
```bash
touch config_components/policies/90_ntp.j2
```
2. Add metadata header:
```jinja2
{# META:
Version: 1.0
Dependencies: 10_system_settings.j2
Validated: 2023-11-20
#}
```
### Version Control
```bash
# Standard workflow
git checkout -b feature/new_vlan_profile
# Edit appropriate .j2 files
git commit -m "feat: Add new voice VLAN profile"
```
## Validation Framework
Each generated config includes:
```text
! VALIDATION MARKS
! [REQUIRED] Verify VLAN assignments: show vlan brief
! [RECOMMENDED] Check interface status: show int status
! [OPTIONAL] Test QoS: test policy-map <name>
```
## Variable Hierarchy
1. `defaults.yml` - Organization-wide standards
2. `site/<location>.yml` - Site-specific overrides
3. `device/<hostname>.yml` - Device-level exceptions
## Maintenance
```text
# Changelog Format
## [Version] YYYY-MM-DD
- [CHANGE] Description (Author)
- [FIX] Bug description (Author)
Example:
## [1.1] 2023-11-20
- [CHANGE] Updated RADIUS servers (jsmith)
- [FIX] Corrected VLAN numbering (adoe)
```
## FAQ
**Q: How to handle one-off exceptions?**
A: Create device-specific vars in `device/` or use conditional blocks:
```jinja2
{% if inventory_hostname == 'switch23' %}
! Special configuration
{% endif %}
```
**Q: Can I use this without Jinja2?**
A: Yes - templates work as:
1. Manual copy-paste docs
2. sed/awk processing sources
3. Full Jinja2 automation
**Q: How to test changes safely?**
```bash
# Dry-run generation
jinja2 base_config.j2 test_vars.yml --format=yaml
```
## License
This template system is [MIT Licensed](LICENSE).
```
This README provides:
1. **Progressive Disclosure** - Simple to advanced usage
2. **Multi-Modal Support** - Manual and automated paths
3. **Built-In Governance** - Versioning and validation
4. **Team Onboarding** - Clear contribution guidelines
Would you like me to add any specific:
- Security considerations?
- Disaster recovery procedures?
- Integration with existing CMDB systems?