Update tech_docs/cloud/aws_notes.md
This commit is contained in:
@@ -1,3 +1,170 @@
|
||||
A **Cloud Network SME** operates at the same level of mastery as a traditional network engineer but with a cloud-native lens. Here’s what they have **top of mind**, structured like the OSI model for clarity:
|
||||
|
||||
---
|
||||
|
||||
### **1. Addressing & Segmentation (Cloud’s "Layer 3")**
|
||||
#### **Top of Mind:**
|
||||
- **RFC 1918 in the Cloud**:
|
||||
- Knows `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16` but also:
|
||||
- **AWS Reserved Ranges**: `169.254.0.0/16` (link-local), `100.64.0.0/10` (Carrier NAT)
|
||||
- **Avoids Overlaps**: Never peers `10.0.0.0/16` with another `10.0.0.0/16` (silent failure).
|
||||
|
||||
- **Subnetting at Scale**:
|
||||
- **/28 Minimum in AWS** (5 IPs reserved per subnet).
|
||||
- **AZ-Aware Design**:
|
||||
```bash
|
||||
# Example: 10.0.0.0/16 → /20 per AZ (AWS best practice)
|
||||
us-east-1a: 10.0.0.0/20
|
||||
us-east-1b: 10.0.16.0/20
|
||||
```
|
||||
|
||||
#### **CLI Command They Use Daily:**
|
||||
```bash
|
||||
aws ec2 describe-subnets --query 'Subnets[*].{AZ:AvailabilityZone,CIDR:CidrBlock,Name:Tags[?Key==`Name`].Value|[0]}' --output table
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **2. Cloud "Layer 4" Mastery (Transport Layer)**
|
||||
#### **Top of Mind:**
|
||||
- **Stateful vs. Stateless**:
|
||||
- **Security Groups (Stateful)**: Return traffic auto-allowed.
|
||||
- **NACLs (Stateless)**: Must allow ephemeral ports (`32768-60999`) bidirectionally.
|
||||
|
||||
- **Port Knowledge**:
|
||||
- **Not Just 80/443**:
|
||||
- `2879` (BGP over Direct Connect)
|
||||
- `6081` (Geneve for AWS VPC Traffic Mirroring)
|
||||
- `53` (DNS for PrivateLink endpoints)
|
||||
|
||||
#### **War Story:**
|
||||
*"Why is my NAT Gateway not working?"*
|
||||
→ Forgot to allow outbound `1024-65535` in the private subnet’s NACL.
|
||||
|
||||
#### **CLI Command They Use Daily:**
|
||||
```bash
|
||||
# Check ephemeral port range on Linux instances
|
||||
cat /proc/sys/net/ipv4/ip_local_port_range
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **3. Cloud "Layer 7" (Application Layer)**
|
||||
#### **Top of Mind:**
|
||||
- **Load Balancer Types**:
|
||||
| Type | Use Case | Key Detail |
|
||||
|------|----------|------------|
|
||||
| ALB | HTTP/HTTPS | Supports path-based routing (`/api/*`) |
|
||||
| NLB | Ultra-low latency | Preserves source IP (no X-Forwarded-For) |
|
||||
| GWLB | Threat inspection | Chains with Firewall (Palo Alto, Fortinet) |
|
||||
|
||||
- **PrivateLink**:
|
||||
- Knows `com.amazonaws.vpce.{region}.vpce-svc-xxxx` DNS format.
|
||||
- **Gotcha**: Doesn’t auto-share Route 53 Private Hosted Zones.
|
||||
|
||||
#### **CLI Command They Use Daily:**
|
||||
```bash
|
||||
aws ec2 describe-vpc-endpoint-services --query 'ServiceDetails[?ServiceType==`Interface`].ServiceName'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **4. Cloud-Specific Protocols**
|
||||
#### **Top of Mind:**
|
||||
- **Geneve (UDP 6081)**:
|
||||
- Encapsulation protocol for AWS Traffic Mirroring.
|
||||
- **BGP over Direct Connect**:
|
||||
- Default `keepalive=60s` is too high—sets to `10s`.
|
||||
- **VXLAN (Overlay for Transit Gateway)**:
|
||||
- Knows TGW attachments use VXLAN headers for cross-account routing.
|
||||
|
||||
#### **War Story:**
|
||||
*"Why is my Direct Connect flapping?"*
|
||||
→ BGP `holddown` timer was left at default (`180s`).
|
||||
|
||||
#### **CLI Command They Use Daily:
|
||||
```bash
|
||||
aws directconnect describe-virtual-interfaces --query 'virtualInterfaces[*].[virtualInterfaceId,bgpPeers[0].bgpStatus]'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **5. Troubleshooting Tools (Like `tcpdump` for Cloud)**
|
||||
#### **Top of Mind:**
|
||||
- **Flow Logs**:
|
||||
- Query with CloudWatch Insights:
|
||||
```sql
|
||||
fields @timestamp, srcAddr, dstAddr, action | filter action="REJECT" | sort @timestamp desc
|
||||
```
|
||||
- **VPC Traffic Mirroring**:
|
||||
- Copies traffic to an analysis instance (like SPAN in trad networks).
|
||||
- **Reachability Analyzer**:
|
||||
- Pre-checks paths before making changes.
|
||||
|
||||
#### **CLI Command They Use Daily:**
|
||||
```bash
|
||||
aws ec2 create-network-insights-path --source <eni-id> --destination-port 443 --protocol tcp
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **6. Cloud Network Limits (Like MTU in Trad Nets)**
|
||||
#### **Top of Mind:**
|
||||
- **AWS MTU**: Always **1500** (jumbo frames not supported over internet/DX).
|
||||
- **NAT Gateway Throughput**:
|
||||
- Up to **100 Gbps** but 5 Gbps per flow.
|
||||
- **Security Group Limits**:
|
||||
- 60 rules per SG, 5 SGs per ENI.
|
||||
|
||||
#### **War Story:**
|
||||
*"Why is my throughput capped at 5 Gbps?"*
|
||||
→ Single TCP flow hitting NAT Gateway limit.
|
||||
|
||||
#### **CLI Command They Use Daily:
|
||||
```bash
|
||||
aws ec2 describe-account-attributes --query 'AccountAttributes[?AttributeName==`max-instances`].AttributeValues'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **7. Automation Mindset (Like Config Templates)**
|
||||
#### **Top of Mind:
|
||||
- **Infrastructure as Code (IaC)**:
|
||||
- Terraform snippets for zero-downtime SG updates:
|
||||
```hcl
|
||||
resource "aws_security_group_rule" "temp_rule" {
|
||||
lifecycle { create_before_destroy = true }
|
||||
}
|
||||
```
|
||||
- **AWS APIs**:
|
||||
- Uses `modify-network-interface-attribute` over console clicks.
|
||||
|
||||
#### **CLI Command They Use Daily:
|
||||
```bash
|
||||
aws ec2 modify-instance-metadata-options --instance-id i-123abc --http-put-response-hop-limit 2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **The Cloud Network SME’s Cheat Sheet**
|
||||
| **Traditional** | **Cloud Equivalent** |
|
||||
|-----------------------|------------------------------------|
|
||||
| Subnetting | VPC CIDR design + AZ distribution |
|
||||
| BGP | Direct Connect BGP timers |
|
||||
| SPAN port | VPC Traffic Mirroring |
|
||||
| Firewall rules | Security Groups + NACLs |
|
||||
| tcpdump | Flow Logs + Athena SQL |
|
||||
|
||||
**Final Tip:** A true cloud SME doesn’t just *know* these—they automate them. For example:
|
||||
```bash
|
||||
# Auto-remediate overly permissive SGs
|
||||
aws ec2 revoke-security-group-egress --group-id sg-123 --ip-permissions 'IpProtocol=-1,FromPort=-1,ToPort=-1,IpRanges=[{CidrIp=0.0.0.0/0}]'
|
||||
```
|
||||
|
||||
Would you like a **hands-on lab** for any of these scenarios?
|
||||
|
||||
---
|
||||
|
||||
# **Deep Dive: Mastering AWS Flow Logs for Advanced Troubleshooting**
|
||||
|
||||
## **1. Flow Logs Fundamentals**
|
||||
|
||||
Reference in New Issue
Block a user