From 9d367f4f46c9c8ba3000e1b8d8e33e3b68ec5855 Mon Sep 17 00:00:00 2001 From: medusa Date: Sun, 20 Jul 2025 21:29:18 -0500 Subject: [PATCH] Update tech_docs/cloud/aws_notes.md --- tech_docs/cloud/aws_notes.md | 168 +++++++++++++++++++++++++++++++++++ 1 file changed, 168 insertions(+) diff --git a/tech_docs/cloud/aws_notes.md b/tech_docs/cloud/aws_notes.md index 8a9aa34..b93db26 100644 --- a/tech_docs/cloud/aws_notes.md +++ b/tech_docs/cloud/aws_notes.md @@ -1,3 +1,171 @@ +Here’s a **mini-lab** to practice the killer skills from our discussion, using only AWS Free Tier resources where possible. You’ll diagnose a real-world scenario, optimize costs, and enforce tagging—just like a cloud network SME would. + +--- + +### **Lab: "The Case of the Phantom Bill"** +**Scenario**: Your company’s AWS bill spiked by \$2,000 last month. CFO is furious. You’ve been tasked to find and fix the issue. + +#### **Lab Objectives** +1. **Find** the cost culprit using AWS tools +2. **Fix** the issue with zero downtime +3. **Prevent** recurrence via automation + +--- + +### **Step 1: Set Up the Crime Scene** +**Deploy the problem environment (5 minutes)**: +```bash +# Create a rogue NAT Gateway (billable item) +VPC_ID=$(aws ec2 describe-vpcs --query 'Vpcs[0].VpcId' --output text) +SUBNET_ID=$(aws ec2 describe-subnets --query 'Subnets[0].SubnetId' --output text) +AWS_REGION=$(aws configure get region) + +# Deploy untagged NAT Gateway (the "phantom bill" culprit) +aws ec2 create-nat-gateway \ + --subnet-id $SUBNET_ID \ + --region $AWS_REGION \ + --tag-specifications 'ResourceType=natgateway,Tags=[{Key=Name,Value=UNUSED_NAT}]' + +# Simulate untagged dev resources (common brownfield mess) +aws ec2 run-instances \ + --image-id ami-0abcdef1234567890 \ + --instance-type t2.micro \ + --subnet-id $SUBNET_ID \ + --tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=dev}]' +``` + +--- + +### **Step 2: Investigate Like a SME** +#### **Skill 1: Cost Forensics with AWS CLI** +```bash +# Find top 5 cost drivers this month (replace dates) +aws ce get-cost-and-usage \ + --time-period Start=2024-01-01,End=2024-01-31 \ + --granularity MONTHLY \ + --metrics "UnblendedCost" \ + --group-by Type=DIMENSION,Key=SERVICE \ + --query 'ResultsByTime[].Groups[?Metrics.UnblendedCost.Amount > `0`] | sort_by(@, &to_number(Metrics.UnblendedCost.Amount))[-5:]' \ + --output table +``` +**Expected Finding**: `AmazonVPC` costs are abnormally high. + +#### **Skill 2: Packet-Level Verification** +Check if NAT Gateway is actually used: +```bash +# Get NAT Gateway IP +NAT_IP=$(aws ec2 describe-nat-gateways --query 'NatGateways[0].NatGatewayAddresses[0].PublicIp' --output text) + +# Start traffic capture (run on an EC2 instance in private subnet) +sudo tcpdump -i eth0 host $NAT_IP -nn -c 10 -w nat_traffic.pcap +``` +**Analysis**: No packets? NAT is unused. + +--- + +### **Step 3: Fix & Automate** +#### **Skill 3: Zero-Downtime Remediation** +```bash +# Step 1: Tag the NAT for deletion (avoid killing active resources) +aws ec2 create-tags \ + --resources $(aws ec2 describe-nat-gateways --query 'NatGateways[0].NatGatewayId' --output text) \ + --tags Key=ExpirationDate,Value=$(date -d "+7 days" +%Y-%m-%d) + +# Step 2: Deploy Lambda auto-cleanup (prevents future issues) +cat > lambda_function.py <<'EOF' +import boto3, datetime +def lambda_handler(event, context): + ec2 = boto3.client('ec2') + expired = ec2.describe_nat_gateways(Filters=[{ + 'Name': 'tag:ExpirationDate', + 'Values': [datetime.datetime.now().strftime('%Y-%m-%d')] + }]) + for nat in expired['NatGateways']: + ec2.delete_nat_gateway(NatGatewayId=nat['NatGatewayId']) +EOF + +# Deploy Lambda (Python 3.9) +aws lambda create-function \ + --function-name CleanupNATs \ + --runtime python3.9 \ + --handler lambda_function.lambda_handler \ + --role arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/lambda-execution-role \ + --zip-file fileb://lambda_function.py +``` + +#### **Skill 4: Tag Enforcement** +```bash +# Deploy SCP to block untagged resources +aws organizations create-policy \ + --name "RequireTags" \ + --description "No tags, no resources" \ + --content '{ + "Version": "2012-10-17", + "Statement": [{ + "Effect": "Deny", + "Action": ["ec2:RunInstances", "ec2:CreateNatGateway"], + "Resource": "*", + "Condition": { + "Null": { + "aws:RequestTag/Owner": "true", + "aws:RequestTag/CostCenter": "true" + } + } + }] + }' +``` + +--- + +### **Step 4: Prove Your Value** +**Generate a Cost Savings Report**: +```bash +# Calculate savings (NAT Gateway: $0.045/hr * 24 * 30 = $32.40/month) +echo "## Monthly Savings Report" > report.md +echo "- **Deleted Unused NAT Gateway**: \$32.40/month" >> report.md +echo "- **Prevented Future Waste**: \$100+/month (estimated)" >> report.md +echo "**Total Annualized Savings**: \$1,588.80" >> report.md + +# Share with leadership +aws ses send-email \ + --from "you@company.com" \ + --to "boss@company.com" \ + --subject "Cost Optimization Results" \ + --text file://report.md +``` + +--- + +### **Lab Extensions (Bonus Points)** +1. **Find Cross-AZ Traffic**: + ```bash + aws ce get-cost-and-usage \ + --filter '{"Dimensions": {"Key": "USAGE_TYPE", "Values": ["DataTransfer-BetweenAZ"]}}' + ``` +2. **Set Up Budget Alarms**: + ```bash + aws budgets create-budget \ + --budget '{"BudgetName": "NAT-Gateway-Alert", "BudgetLimit": {"Amount": "50", "Unit": "USD"}, "CostFilters": {"Service": ["AmazonVPC"]}}' + ``` + +--- + +### **Why This Lab Matters** +- **Real AWS Resources**: Uses actual billable services (but stays Free Tier friendly). +- **SME Skills Practiced**: + - Cost analysis via CLI + - Packet-level verification + - Zero-downtime fixes + - Tag governance +- **Career Impact**: These exact skills have gotten engineers promoted. + +**Time to Complete**: ~30 minutes. +**Cost**: < \$0.50 (delete NAT Gateway immediately after lab). + +Want a **more advanced version** with Direct Connect or hybrid cloud scenarios? Let me know! + +--- + Here’s the **killer skill set** that combines "boring" fundamentals with cloud-native expertise to make you the **unquestioned SME**—the one who fixes what others can’t, optimizes what others overlook, and becomes indispensable: ---