Update tech_docs/cloud/aws_notes.md
This commit is contained in:
@@ -1,4 +1,244 @@
|
||||
Here's a polished, cohesive version of your notes with improved flow, filled-in gaps, and tighter organization while preserving all critical details:
|
||||
# **Deep Dive: Mastering AWS Flow Logs for Advanced Troubleshooting**
|
||||
|
||||
## **1. Flow Logs Fundamentals**
|
||||
### **What Flow Logs Capture**
|
||||
Flow Logs record **IP traffic metadata** (not payload data) for:
|
||||
- **VPCs**
|
||||
- **Subnets**
|
||||
- **Elastic Network Interfaces (ENIs)**
|
||||
|
||||
**Key Fields:**
|
||||
| Field | Description | Example |
|
||||
|-------|-------------|---------|
|
||||
| `version` | Flow log version | `2` |
|
||||
| `account-id` | AWS account ID | `123456789012` |
|
||||
| `interface-id` | ENI ID | `eni-12345abc` |
|
||||
| `srcaddr` | Source IP | `10.0.1.5` |
|
||||
| `dstaddr` | Destination IP | `8.8.8.8` |
|
||||
| `srcport` | Source port | `32768` |
|
||||
| `dstport` | Destination port | `443` |
|
||||
| `protocol` | IP protocol number | `6` (TCP) |
|
||||
| `packets` | Packets in flow | `5` |
|
||||
| `bytes` | Bytes transferred | `1024` |
|
||||
| `start` | Flow start (Unix epoch) | `1625097600` |
|
||||
| `end` | Flow end (Unix epoch) | `1625097605` |
|
||||
| `action` | `ACCEPT` or `REJECT` | `REJECT` |
|
||||
| `log-status` | Logging status | `OK` |
|
||||
|
||||
### **When to Use Flow Logs**
|
||||
✅ **Troubleshooting connectivity issues**
|
||||
✅ **Security incident investigations**
|
||||
✅ **Network performance analysis**
|
||||
✅ **Compliance auditing**
|
||||
|
||||
---
|
||||
|
||||
## **2. Enabling & Configuring Flow Logs**
|
||||
### **GUI Method (Quick Setup)**
|
||||
1. **VPC Dashboard** → Select VPC → **Actions** → **Create Flow Log**
|
||||
2. Configure:
|
||||
- **Filter**: `ALL` (recommended), `ACCEPT`, or `REJECT`
|
||||
- **Destination**:
|
||||
- **CloudWatch Logs** (real-time analysis)
|
||||
- **S3** (long-term storage)
|
||||
- **Log Format**: Default or custom (e.g., add `${tcp-flags}`)
|
||||
|
||||
### **CLI Method (Automation-Friendly)**
|
||||
```bash
|
||||
# Send to CloudWatch Logs
|
||||
aws ec2 create-flow-logs \
|
||||
--resource-type VPC \
|
||||
--resource-id vpc-123abc \
|
||||
--traffic-type ALL \
|
||||
--log-destination-type cloud-watch-logs \
|
||||
--log-group-name "VPCFlowLogs" \
|
||||
--log-format '${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}'
|
||||
|
||||
# Send to S3 (for compliance)
|
||||
aws ec2 create-flow-logs \
|
||||
--resource-type Subnet \
|
||||
--resource-id subnet-456def \
|
||||
--traffic-type REJECT \ # Only log blocked traffic
|
||||
--log-destination-type s3 \
|
||||
--log-destination "arn:aws:s3:::my-flow-logs-bucket"
|
||||
```
|
||||
|
||||
### **Advanced Custom Fields**
|
||||
Add these to `--log-format` for deeper insights:
|
||||
- `${pkt-srcaddr}` / `${pkt-dstaddr}` (NAT-translated IPs)
|
||||
- `${tcp-flags}` (SYN, ACK, RST)
|
||||
- `${type}` (IPv4/IPv6)
|
||||
|
||||
---
|
||||
|
||||
## **3. Analyzing Flow Logs**
|
||||
### **CloudWatch Logs Insights (GUI)**
|
||||
**Best for:** Ad-hoc troubleshooting
|
||||
**Key Queries:**
|
||||
|
||||
#### **1. Top Talkers (Bandwidth Analysis)**
|
||||
```sql
|
||||
fields @timestamp, srcAddr, dstAddr, bytes
|
||||
| stats sum(bytes) as totalBytes by srcAddr, dstAddr
|
||||
| sort totalBytes desc
|
||||
| limit 20
|
||||
```
|
||||
|
||||
#### **2. Blocked Traffic Investigation**
|
||||
```sql
|
||||
fields @timestamp, srcAddr, dstAddr, dstPort, action
|
||||
| filter action = "REJECT"
|
||||
| sort @timestamp desc
|
||||
| limit 50
|
||||
```
|
||||
|
||||
#### **3. NAT Gateway Health Check**
|
||||
```sql
|
||||
fields @timestamp, srcAddr, dstAddr, action
|
||||
| filter srcAddr like "10.0.1." and dstAddr like "8.8.8."
|
||||
| stats count(*) as attempts by bin(5m)
|
||||
| sort @timestamp desc
|
||||
```
|
||||
|
||||
#### **4. Suspicious Port Scanning**
|
||||
```sql
|
||||
fields @timestamp, srcAddr, dstPort
|
||||
| filter dstPort >= 3000 and dstPort <= 4000
|
||||
| stats count(*) by srcAddr, dstPort
|
||||
| sort count(*) desc
|
||||
```
|
||||
|
||||
### **Athena (S3-Based SQL Analysis)**
|
||||
**Best for:** Large-scale historical analysis
|
||||
**Setup:**
|
||||
1. Create Athena table:
|
||||
```sql
|
||||
CREATE EXTERNAL TABLE vpc_flow_logs (
|
||||
version int,
|
||||
account_id string,
|
||||
interface_id string,
|
||||
srcaddr string,
|
||||
dstaddr string,
|
||||
srcport int,
|
||||
dstport int,
|
||||
protocol int,
|
||||
packets bigint,
|
||||
bytes bigint,
|
||||
start bigint,
|
||||
end bigint,
|
||||
action string,
|
||||
log_status string
|
||||
)
|
||||
PARTITIONED BY (dt string)
|
||||
ROW FORMAT DELIMITED
|
||||
FIELDS TERMINATED BY ' '
|
||||
LOCATION 's3://my-flow-logs-bucket/AWSLogs/123456789012/vpcflowlogs/us-east-1/'
|
||||
```
|
||||
|
||||
**Query Example:**
|
||||
```sql
|
||||
-- Find all blocked SSH attempts
|
||||
SELECT srcaddr, COUNT(*) as block_count
|
||||
FROM vpc_flow_logs
|
||||
WHERE dstport = 22 AND action = 'REJECT'
|
||||
GROUP BY srcaddr
|
||||
ORDER BY block_count DESC
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **4. Real-World Troubleshooting Scenarios**
|
||||
### **Case 1: "Why Can’t My Instance Reach the Internet?"**
|
||||
**Steps:**
|
||||
1. **Check Flow Logs for Rejects:**
|
||||
```sql
|
||||
fields @timestamp, srcAddr, dstAddr, dstPort, action
|
||||
| filter srcAddr = "10.0.1.5" and dstAddr like "8.8.8."
|
||||
| sort @timestamp desc
|
||||
```
|
||||
2. **If `REJECT`:**
|
||||
- Check **NACLs** and **Security Groups**
|
||||
3. **If No Logs:**
|
||||
- Verify **route tables** (`0.0.0.0/0 → nat-xxx`)
|
||||
|
||||
### **Case 2: "Who’s Accessing My Database?"**
|
||||
```sql
|
||||
fields @timestamp, srcAddr, dstAddr, dstPort
|
||||
| filter dstAddr = "10.0.2.10" and dstPort = 3306
|
||||
| stats count(*) by srcAddr
|
||||
| sort count(*) desc
|
||||
```
|
||||
|
||||
### **Case 3: "Is My Application Generating Excessive Traffic?"**
|
||||
```sql
|
||||
fields @timestamp, srcAddr, dstAddr, bytes
|
||||
| filter dstAddr like "10.0.3."
|
||||
| stats sum(bytes) as totalBytes by bin(1h)
|
||||
| sort totalBytes desc
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **5. Pro Tips for Production**
|
||||
### **1. Optimize Costs**
|
||||
- Use **S3 + Athena** for long-term storage (cheaper than CloudWatch)
|
||||
- Filter `REJECT`-only logs for security use cases
|
||||
|
||||
### **2. Automate Alerts**
|
||||
```bash
|
||||
# CloudWatch Alarm for DDoS-like traffic
|
||||
aws cloudwatch put-metric-alarm \
|
||||
--alarm-name "High-Reject-Rate" \
|
||||
--metric-name "RejectedPackets" \
|
||||
--namespace "AWS/Logs" \
|
||||
--statistic "Sum" \
|
||||
--period 300 \
|
||||
--threshold 1000 \
|
||||
--comparison-operator "GreaterThanThreshold" \
|
||||
--evaluation-periods 1
|
||||
```
|
||||
|
||||
### **3. Centralized Logging**
|
||||
Aggregate logs from multiple accounts:
|
||||
```bash
|
||||
aws logs put-subscription-filter \
|
||||
--log-group-name "VPCFlowLogs" \
|
||||
--filter-name "CrossAccountStream" \
|
||||
--filter-pattern "" \
|
||||
--destination-arn "arn:aws:logs:us-east-1:123456789012:destination:CentralAccount"
|
||||
```
|
||||
|
||||
### **4. Security Hardening**
|
||||
```sql
|
||||
-- Detect port scanning
|
||||
fields @timestamp, srcAddr, dstPort
|
||||
| filter dstPort >= 0 and dstPort <= 1024
|
||||
| stats count_distinct(dstPort) as portsScanned by srcAddr
|
||||
| filter portsScanned > 5
|
||||
| sort portsScanned desc
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **6. Limitations & Workarounds**
|
||||
| Limitation | Workaround |
|
||||
|------------|------------|
|
||||
| No payload data | Use **Traffic Mirroring** + `tcpdump` |
|
||||
| ~15 min delay | Use **CloudWatch Metrics** for near-real-time |
|
||||
| No MAC addresses | Correlate with `describe-network-interfaces` |
|
||||
|
||||
---
|
||||
|
||||
## **Final Checklist**
|
||||
1. [ ] Enable flow logs on all critical VPCs
|
||||
2. [ ] Set up CloudWatch dashboards for key queries
|
||||
3. [ ] Configure S3 archiving for compliance
|
||||
4. [ ] Automate security alerts (e.g., port scans)
|
||||
5. [ ] Document common troubleshooting queries
|
||||
|
||||
**Flow logs are your network’s black box recorder—enable them before you need them!**
|
||||
|
||||
Would you like a **hands-on lab walkthrough** for a specific troubleshooting scenario?
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user