From 5a719ed2b6bf436ded140c4434b7ea1503f1d25f Mon Sep 17 00:00:00 2001 From: medusa Date: Sun, 20 Jul 2025 21:03:54 -0500 Subject: [PATCH] Update tech_docs/cloud/aws_notes.md --- tech_docs/cloud/aws_notes.md | 242 ++++++++++++++++++++++++++++++++++- 1 file changed, 241 insertions(+), 1 deletion(-) diff --git a/tech_docs/cloud/aws_notes.md b/tech_docs/cloud/aws_notes.md index ffbcf11..3d4aa31 100644 --- a/tech_docs/cloud/aws_notes.md +++ b/tech_docs/cloud/aws_notes.md @@ -1,4 +1,244 @@ -Here's a polished, cohesive version of your notes with improved flow, filled-in gaps, and tighter organization while preserving all critical details: +# **Deep Dive: Mastering AWS Flow Logs for Advanced Troubleshooting** + +## **1. Flow Logs Fundamentals** +### **What Flow Logs Capture** +Flow Logs record **IP traffic metadata** (not payload data) for: +- **VPCs** +- **Subnets** +- **Elastic Network Interfaces (ENIs)** + +**Key Fields:** +| Field | Description | Example | +|-------|-------------|---------| +| `version` | Flow log version | `2` | +| `account-id` | AWS account ID | `123456789012` | +| `interface-id` | ENI ID | `eni-12345abc` | +| `srcaddr` | Source IP | `10.0.1.5` | +| `dstaddr` | Destination IP | `8.8.8.8` | +| `srcport` | Source port | `32768` | +| `dstport` | Destination port | `443` | +| `protocol` | IP protocol number | `6` (TCP) | +| `packets` | Packets in flow | `5` | +| `bytes` | Bytes transferred | `1024` | +| `start` | Flow start (Unix epoch) | `1625097600` | +| `end` | Flow end (Unix epoch) | `1625097605` | +| `action` | `ACCEPT` or `REJECT` | `REJECT` | +| `log-status` | Logging status | `OK` | + +### **When to Use Flow Logs** +✅ **Troubleshooting connectivity issues** +✅ **Security incident investigations** +✅ **Network performance analysis** +✅ **Compliance auditing** + +--- + +## **2. Enabling & Configuring Flow Logs** +### **GUI Method (Quick Setup)** +1. **VPC Dashboard** → Select VPC → **Actions** → **Create Flow Log** +2. Configure: + - **Filter**: `ALL` (recommended), `ACCEPT`, or `REJECT` + - **Destination**: + - **CloudWatch Logs** (real-time analysis) + - **S3** (long-term storage) + - **Log Format**: Default or custom (e.g., add `${tcp-flags}`) + +### **CLI Method (Automation-Friendly)** +```bash +# Send to CloudWatch Logs +aws ec2 create-flow-logs \ + --resource-type VPC \ + --resource-id vpc-123abc \ + --traffic-type ALL \ + --log-destination-type cloud-watch-logs \ + --log-group-name "VPCFlowLogs" \ + --log-format '${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}' + +# Send to S3 (for compliance) +aws ec2 create-flow-logs \ + --resource-type Subnet \ + --resource-id subnet-456def \ + --traffic-type REJECT \ # Only log blocked traffic + --log-destination-type s3 \ + --log-destination "arn:aws:s3:::my-flow-logs-bucket" +``` + +### **Advanced Custom Fields** +Add these to `--log-format` for deeper insights: +- `${pkt-srcaddr}` / `${pkt-dstaddr}` (NAT-translated IPs) +- `${tcp-flags}` (SYN, ACK, RST) +- `${type}` (IPv4/IPv6) + +--- + +## **3. Analyzing Flow Logs** +### **CloudWatch Logs Insights (GUI)** +**Best for:** Ad-hoc troubleshooting +**Key Queries:** + +#### **1. Top Talkers (Bandwidth Analysis)** +```sql +fields @timestamp, srcAddr, dstAddr, bytes +| stats sum(bytes) as totalBytes by srcAddr, dstAddr +| sort totalBytes desc +| limit 20 +``` + +#### **2. Blocked Traffic Investigation** +```sql +fields @timestamp, srcAddr, dstAddr, dstPort, action +| filter action = "REJECT" +| sort @timestamp desc +| limit 50 +``` + +#### **3. NAT Gateway Health Check** +```sql +fields @timestamp, srcAddr, dstAddr, action +| filter srcAddr like "10.0.1." and dstAddr like "8.8.8." +| stats count(*) as attempts by bin(5m) +| sort @timestamp desc +``` + +#### **4. Suspicious Port Scanning** +```sql +fields @timestamp, srcAddr, dstPort +| filter dstPort >= 3000 and dstPort <= 4000 +| stats count(*) by srcAddr, dstPort +| sort count(*) desc +``` + +### **Athena (S3-Based SQL Analysis)** +**Best for:** Large-scale historical analysis +**Setup:** +1. Create Athena table: +```sql +CREATE EXTERNAL TABLE vpc_flow_logs ( + version int, + account_id string, + interface_id string, + srcaddr string, + dstaddr string, + srcport int, + dstport int, + protocol int, + packets bigint, + bytes bigint, + start bigint, + end bigint, + action string, + log_status string +) +PARTITIONED BY (dt string) +ROW FORMAT DELIMITED +FIELDS TERMINATED BY ' ' +LOCATION 's3://my-flow-logs-bucket/AWSLogs/123456789012/vpcflowlogs/us-east-1/' +``` + +**Query Example:** +```sql +-- Find all blocked SSH attempts +SELECT srcaddr, COUNT(*) as block_count +FROM vpc_flow_logs +WHERE dstport = 22 AND action = 'REJECT' +GROUP BY srcaddr +ORDER BY block_count DESC +``` + +--- + +## **4. Real-World Troubleshooting Scenarios** +### **Case 1: "Why Can’t My Instance Reach the Internet?"** +**Steps:** +1. **Check Flow Logs for Rejects:** + ```sql + fields @timestamp, srcAddr, dstAddr, dstPort, action + | filter srcAddr = "10.0.1.5" and dstAddr like "8.8.8." + | sort @timestamp desc + ``` +2. **If `REJECT`:** + - Check **NACLs** and **Security Groups** +3. **If No Logs:** + - Verify **route tables** (`0.0.0.0/0 → nat-xxx`) + +### **Case 2: "Who’s Accessing My Database?"** +```sql +fields @timestamp, srcAddr, dstAddr, dstPort +| filter dstAddr = "10.0.2.10" and dstPort = 3306 +| stats count(*) by srcAddr +| sort count(*) desc +``` + +### **Case 3: "Is My Application Generating Excessive Traffic?"** +```sql +fields @timestamp, srcAddr, dstAddr, bytes +| filter dstAddr like "10.0.3." +| stats sum(bytes) as totalBytes by bin(1h) +| sort totalBytes desc +``` + +--- + +## **5. Pro Tips for Production** +### **1. Optimize Costs** +- Use **S3 + Athena** for long-term storage (cheaper than CloudWatch) +- Filter `REJECT`-only logs for security use cases + +### **2. Automate Alerts** +```bash +# CloudWatch Alarm for DDoS-like traffic +aws cloudwatch put-metric-alarm \ + --alarm-name "High-Reject-Rate" \ + --metric-name "RejectedPackets" \ + --namespace "AWS/Logs" \ + --statistic "Sum" \ + --period 300 \ + --threshold 1000 \ + --comparison-operator "GreaterThanThreshold" \ + --evaluation-periods 1 +``` + +### **3. Centralized Logging** +Aggregate logs from multiple accounts: +```bash +aws logs put-subscription-filter \ + --log-group-name "VPCFlowLogs" \ + --filter-name "CrossAccountStream" \ + --filter-pattern "" \ + --destination-arn "arn:aws:logs:us-east-1:123456789012:destination:CentralAccount" +``` + +### **4. Security Hardening** +```sql +-- Detect port scanning +fields @timestamp, srcAddr, dstPort +| filter dstPort >= 0 and dstPort <= 1024 +| stats count_distinct(dstPort) as portsScanned by srcAddr +| filter portsScanned > 5 +| sort portsScanned desc +``` + +--- + +## **6. Limitations & Workarounds** +| Limitation | Workaround | +|------------|------------| +| No payload data | Use **Traffic Mirroring** + `tcpdump` | +| ~15 min delay | Use **CloudWatch Metrics** for near-real-time | +| No MAC addresses | Correlate with `describe-network-interfaces` | + +--- + +## **Final Checklist** +1. [ ] Enable flow logs on all critical VPCs +2. [ ] Set up CloudWatch dashboards for key queries +3. [ ] Configure S3 archiving for compliance +4. [ ] Automate security alerts (e.g., port scans) +5. [ ] Document common troubleshooting queries + +**Flow logs are your network’s black box recorder—enable them before you need them!** + +Would you like a **hands-on lab walkthrough** for a specific troubleshooting scenario? ---