Update tech_docs/cloud/aws_notes.md
This commit is contained in:
@@ -1,3 +1,170 @@
|
|||||||
|
A **Cloud Network SME** operates at the same level of mastery as a traditional network engineer but with a cloud-native lens. Here’s what they have **top of mind**, structured like the OSI model for clarity:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **1. Addressing & Segmentation (Cloud’s "Layer 3")**
|
||||||
|
#### **Top of Mind:**
|
||||||
|
- **RFC 1918 in the Cloud**:
|
||||||
|
- Knows `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16` but also:
|
||||||
|
- **AWS Reserved Ranges**: `169.254.0.0/16` (link-local), `100.64.0.0/10` (Carrier NAT)
|
||||||
|
- **Avoids Overlaps**: Never peers `10.0.0.0/16` with another `10.0.0.0/16` (silent failure).
|
||||||
|
|
||||||
|
- **Subnetting at Scale**:
|
||||||
|
- **/28 Minimum in AWS** (5 IPs reserved per subnet).
|
||||||
|
- **AZ-Aware Design**:
|
||||||
|
```bash
|
||||||
|
# Example: 10.0.0.0/16 → /20 per AZ (AWS best practice)
|
||||||
|
us-east-1a: 10.0.0.0/20
|
||||||
|
us-east-1b: 10.0.16.0/20
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **CLI Command They Use Daily:**
|
||||||
|
```bash
|
||||||
|
aws ec2 describe-subnets --query 'Subnets[*].{AZ:AvailabilityZone,CIDR:CidrBlock,Name:Tags[?Key==`Name`].Value|[0]}' --output table
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **2. Cloud "Layer 4" Mastery (Transport Layer)**
|
||||||
|
#### **Top of Mind:**
|
||||||
|
- **Stateful vs. Stateless**:
|
||||||
|
- **Security Groups (Stateful)**: Return traffic auto-allowed.
|
||||||
|
- **NACLs (Stateless)**: Must allow ephemeral ports (`32768-60999`) bidirectionally.
|
||||||
|
|
||||||
|
- **Port Knowledge**:
|
||||||
|
- **Not Just 80/443**:
|
||||||
|
- `2879` (BGP over Direct Connect)
|
||||||
|
- `6081` (Geneve for AWS VPC Traffic Mirroring)
|
||||||
|
- `53` (DNS for PrivateLink endpoints)
|
||||||
|
|
||||||
|
#### **War Story:**
|
||||||
|
*"Why is my NAT Gateway not working?"*
|
||||||
|
→ Forgot to allow outbound `1024-65535` in the private subnet’s NACL.
|
||||||
|
|
||||||
|
#### **CLI Command They Use Daily:**
|
||||||
|
```bash
|
||||||
|
# Check ephemeral port range on Linux instances
|
||||||
|
cat /proc/sys/net/ipv4/ip_local_port_range
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **3. Cloud "Layer 7" (Application Layer)**
|
||||||
|
#### **Top of Mind:**
|
||||||
|
- **Load Balancer Types**:
|
||||||
|
| Type | Use Case | Key Detail |
|
||||||
|
|------|----------|------------|
|
||||||
|
| ALB | HTTP/HTTPS | Supports path-based routing (`/api/*`) |
|
||||||
|
| NLB | Ultra-low latency | Preserves source IP (no X-Forwarded-For) |
|
||||||
|
| GWLB | Threat inspection | Chains with Firewall (Palo Alto, Fortinet) |
|
||||||
|
|
||||||
|
- **PrivateLink**:
|
||||||
|
- Knows `com.amazonaws.vpce.{region}.vpce-svc-xxxx` DNS format.
|
||||||
|
- **Gotcha**: Doesn’t auto-share Route 53 Private Hosted Zones.
|
||||||
|
|
||||||
|
#### **CLI Command They Use Daily:**
|
||||||
|
```bash
|
||||||
|
aws ec2 describe-vpc-endpoint-services --query 'ServiceDetails[?ServiceType==`Interface`].ServiceName'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **4. Cloud-Specific Protocols**
|
||||||
|
#### **Top of Mind:**
|
||||||
|
- **Geneve (UDP 6081)**:
|
||||||
|
- Encapsulation protocol for AWS Traffic Mirroring.
|
||||||
|
- **BGP over Direct Connect**:
|
||||||
|
- Default `keepalive=60s` is too high—sets to `10s`.
|
||||||
|
- **VXLAN (Overlay for Transit Gateway)**:
|
||||||
|
- Knows TGW attachments use VXLAN headers for cross-account routing.
|
||||||
|
|
||||||
|
#### **War Story:**
|
||||||
|
*"Why is my Direct Connect flapping?"*
|
||||||
|
→ BGP `holddown` timer was left at default (`180s`).
|
||||||
|
|
||||||
|
#### **CLI Command They Use Daily:
|
||||||
|
```bash
|
||||||
|
aws directconnect describe-virtual-interfaces --query 'virtualInterfaces[*].[virtualInterfaceId,bgpPeers[0].bgpStatus]'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **5. Troubleshooting Tools (Like `tcpdump` for Cloud)**
|
||||||
|
#### **Top of Mind:**
|
||||||
|
- **Flow Logs**:
|
||||||
|
- Query with CloudWatch Insights:
|
||||||
|
```sql
|
||||||
|
fields @timestamp, srcAddr, dstAddr, action | filter action="REJECT" | sort @timestamp desc
|
||||||
|
```
|
||||||
|
- **VPC Traffic Mirroring**:
|
||||||
|
- Copies traffic to an analysis instance (like SPAN in trad networks).
|
||||||
|
- **Reachability Analyzer**:
|
||||||
|
- Pre-checks paths before making changes.
|
||||||
|
|
||||||
|
#### **CLI Command They Use Daily:**
|
||||||
|
```bash
|
||||||
|
aws ec2 create-network-insights-path --source <eni-id> --destination-port 443 --protocol tcp
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **6. Cloud Network Limits (Like MTU in Trad Nets)**
|
||||||
|
#### **Top of Mind:**
|
||||||
|
- **AWS MTU**: Always **1500** (jumbo frames not supported over internet/DX).
|
||||||
|
- **NAT Gateway Throughput**:
|
||||||
|
- Up to **100 Gbps** but 5 Gbps per flow.
|
||||||
|
- **Security Group Limits**:
|
||||||
|
- 60 rules per SG, 5 SGs per ENI.
|
||||||
|
|
||||||
|
#### **War Story:**
|
||||||
|
*"Why is my throughput capped at 5 Gbps?"*
|
||||||
|
→ Single TCP flow hitting NAT Gateway limit.
|
||||||
|
|
||||||
|
#### **CLI Command They Use Daily:
|
||||||
|
```bash
|
||||||
|
aws ec2 describe-account-attributes --query 'AccountAttributes[?AttributeName==`max-instances`].AttributeValues'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **7. Automation Mindset (Like Config Templates)**
|
||||||
|
#### **Top of Mind:
|
||||||
|
- **Infrastructure as Code (IaC)**:
|
||||||
|
- Terraform snippets for zero-downtime SG updates:
|
||||||
|
```hcl
|
||||||
|
resource "aws_security_group_rule" "temp_rule" {
|
||||||
|
lifecycle { create_before_destroy = true }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
- **AWS APIs**:
|
||||||
|
- Uses `modify-network-interface-attribute` over console clicks.
|
||||||
|
|
||||||
|
#### **CLI Command They Use Daily:
|
||||||
|
```bash
|
||||||
|
aws ec2 modify-instance-metadata-options --instance-id i-123abc --http-put-response-hop-limit 2
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **The Cloud Network SME’s Cheat Sheet**
|
||||||
|
| **Traditional** | **Cloud Equivalent** |
|
||||||
|
|-----------------------|------------------------------------|
|
||||||
|
| Subnetting | VPC CIDR design + AZ distribution |
|
||||||
|
| BGP | Direct Connect BGP timers |
|
||||||
|
| SPAN port | VPC Traffic Mirroring |
|
||||||
|
| Firewall rules | Security Groups + NACLs |
|
||||||
|
| tcpdump | Flow Logs + Athena SQL |
|
||||||
|
|
||||||
|
**Final Tip:** A true cloud SME doesn’t just *know* these—they automate them. For example:
|
||||||
|
```bash
|
||||||
|
# Auto-remediate overly permissive SGs
|
||||||
|
aws ec2 revoke-security-group-egress --group-id sg-123 --ip-permissions 'IpProtocol=-1,FromPort=-1,ToPort=-1,IpRanges=[{CidrIp=0.0.0.0/0}]'
|
||||||
|
```
|
||||||
|
|
||||||
|
Would you like a **hands-on lab** for any of these scenarios?
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
# **Deep Dive: Mastering AWS Flow Logs for Advanced Troubleshooting**
|
# **Deep Dive: Mastering AWS Flow Logs for Advanced Troubleshooting**
|
||||||
|
|
||||||
## **1. Flow Logs Fundamentals**
|
## **1. Flow Logs Fundamentals**
|
||||||
|
|||||||
Reference in New Issue
Block a user