From 947b14ec5faa81cc90ac58ce52b27ac159f27442 Mon Sep 17 00:00:00 2001 From: medusa Date: Sun, 20 Jul 2025 21:07:56 -0500 Subject: [PATCH] Update tech_docs/cloud/aws_notes.md --- tech_docs/cloud/aws_notes.md | 167 +++++++++++++++++++++++++++++++++++ 1 file changed, 167 insertions(+) diff --git a/tech_docs/cloud/aws_notes.md b/tech_docs/cloud/aws_notes.md index 3d4aa31..b751a30 100644 --- a/tech_docs/cloud/aws_notes.md +++ b/tech_docs/cloud/aws_notes.md @@ -1,3 +1,170 @@ +A **Cloud Network SME** operates at the same level of mastery as a traditional network engineer but with a cloud-native lens. Here’s what they have **top of mind**, structured like the OSI model for clarity: + +--- + +### **1. Addressing & Segmentation (Cloud’s "Layer 3")** +#### **Top of Mind:** +- **RFC 1918 in the Cloud**: + - Knows `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16` but also: + - **AWS Reserved Ranges**: `169.254.0.0/16` (link-local), `100.64.0.0/10` (Carrier NAT) + - **Avoids Overlaps**: Never peers `10.0.0.0/16` with another `10.0.0.0/16` (silent failure). + +- **Subnetting at Scale**: + - **/28 Minimum in AWS** (5 IPs reserved per subnet). + - **AZ-Aware Design**: + ```bash + # Example: 10.0.0.0/16 → /20 per AZ (AWS best practice) + us-east-1a: 10.0.0.0/20 + us-east-1b: 10.0.16.0/20 + ``` + +#### **CLI Command They Use Daily:** +```bash +aws ec2 describe-subnets --query 'Subnets[*].{AZ:AvailabilityZone,CIDR:CidrBlock,Name:Tags[?Key==`Name`].Value|[0]}' --output table +``` + +--- + +### **2. Cloud "Layer 4" Mastery (Transport Layer)** +#### **Top of Mind:** +- **Stateful vs. Stateless**: + - **Security Groups (Stateful)**: Return traffic auto-allowed. + - **NACLs (Stateless)**: Must allow ephemeral ports (`32768-60999`) bidirectionally. + +- **Port Knowledge**: + - **Not Just 80/443**: + - `2879` (BGP over Direct Connect) + - `6081` (Geneve for AWS VPC Traffic Mirroring) + - `53` (DNS for PrivateLink endpoints) + +#### **War Story:** +*"Why is my NAT Gateway not working?"* +→ Forgot to allow outbound `1024-65535` in the private subnet’s NACL. + +#### **CLI Command They Use Daily:** +```bash +# Check ephemeral port range on Linux instances +cat /proc/sys/net/ipv4/ip_local_port_range +``` + +--- + +### **3. Cloud "Layer 7" (Application Layer)** +#### **Top of Mind:** +- **Load Balancer Types**: + | Type | Use Case | Key Detail | + |------|----------|------------| + | ALB | HTTP/HTTPS | Supports path-based routing (`/api/*`) | + | NLB | Ultra-low latency | Preserves source IP (no X-Forwarded-For) | + | GWLB | Threat inspection | Chains with Firewall (Palo Alto, Fortinet) | + +- **PrivateLink**: + - Knows `com.amazonaws.vpce.{region}.vpce-svc-xxxx` DNS format. + - **Gotcha**: Doesn’t auto-share Route 53 Private Hosted Zones. + +#### **CLI Command They Use Daily:** +```bash +aws ec2 describe-vpc-endpoint-services --query 'ServiceDetails[?ServiceType==`Interface`].ServiceName' +``` + +--- + +### **4. Cloud-Specific Protocols** +#### **Top of Mind:** +- **Geneve (UDP 6081)**: + - Encapsulation protocol for AWS Traffic Mirroring. +- **BGP over Direct Connect**: + - Default `keepalive=60s` is too high—sets to `10s`. +- **VXLAN (Overlay for Transit Gateway)**: + - Knows TGW attachments use VXLAN headers for cross-account routing. + +#### **War Story:** +*"Why is my Direct Connect flapping?"* +→ BGP `holddown` timer was left at default (`180s`). + +#### **CLI Command They Use Daily: +```bash +aws directconnect describe-virtual-interfaces --query 'virtualInterfaces[*].[virtualInterfaceId,bgpPeers[0].bgpStatus]' +``` + +--- + +### **5. Troubleshooting Tools (Like `tcpdump` for Cloud)** +#### **Top of Mind:** +- **Flow Logs**: + - Query with CloudWatch Insights: + ```sql + fields @timestamp, srcAddr, dstAddr, action | filter action="REJECT" | sort @timestamp desc + ``` +- **VPC Traffic Mirroring**: + - Copies traffic to an analysis instance (like SPAN in trad networks). +- **Reachability Analyzer**: + - Pre-checks paths before making changes. + +#### **CLI Command They Use Daily:** +```bash +aws ec2 create-network-insights-path --source --destination-port 443 --protocol tcp +``` + +--- + +### **6. Cloud Network Limits (Like MTU in Trad Nets)** +#### **Top of Mind:** +- **AWS MTU**: Always **1500** (jumbo frames not supported over internet/DX). +- **NAT Gateway Throughput**: + - Up to **100 Gbps** but 5 Gbps per flow. +- **Security Group Limits**: + - 60 rules per SG, 5 SGs per ENI. + +#### **War Story:** +*"Why is my throughput capped at 5 Gbps?"* +→ Single TCP flow hitting NAT Gateway limit. + +#### **CLI Command They Use Daily: +```bash +aws ec2 describe-account-attributes --query 'AccountAttributes[?AttributeName==`max-instances`].AttributeValues' +``` + +--- + +### **7. Automation Mindset (Like Config Templates)** +#### **Top of Mind: +- **Infrastructure as Code (IaC)**: + - Terraform snippets for zero-downtime SG updates: + ```hcl + resource "aws_security_group_rule" "temp_rule" { + lifecycle { create_before_destroy = true } + } + ``` +- **AWS APIs**: + - Uses `modify-network-interface-attribute` over console clicks. + +#### **CLI Command They Use Daily: +```bash +aws ec2 modify-instance-metadata-options --instance-id i-123abc --http-put-response-hop-limit 2 +``` + +--- + +### **The Cloud Network SME’s Cheat Sheet** +| **Traditional** | **Cloud Equivalent** | +|-----------------------|------------------------------------| +| Subnetting | VPC CIDR design + AZ distribution | +| BGP | Direct Connect BGP timers | +| SPAN port | VPC Traffic Mirroring | +| Firewall rules | Security Groups + NACLs | +| tcpdump | Flow Logs + Athena SQL | + +**Final Tip:** A true cloud SME doesn’t just *know* these—they automate them. For example: +```bash +# Auto-remediate overly permissive SGs +aws ec2 revoke-security-group-egress --group-id sg-123 --ip-permissions 'IpProtocol=-1,FromPort=-1,ToPort=-1,IpRanges=[{CidrIp=0.0.0.0/0}]' +``` + +Would you like a **hands-on lab** for any of these scenarios? + +--- + # **Deep Dive: Mastering AWS Flow Logs for Advanced Troubleshooting** ## **1. Flow Logs Fundamentals**