Update tech_docs/cloud/aws_notes.md
This commit is contained in:
@@ -1,3 +1,171 @@
|
|||||||
|
Here’s a **mini-lab** to practice the killer skills from our discussion, using only AWS Free Tier resources where possible. You’ll diagnose a real-world scenario, optimize costs, and enforce tagging—just like a cloud network SME would.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Lab: "The Case of the Phantom Bill"**
|
||||||
|
**Scenario**: Your company’s AWS bill spiked by \$2,000 last month. CFO is furious. You’ve been tasked to find and fix the issue.
|
||||||
|
|
||||||
|
#### **Lab Objectives**
|
||||||
|
1. **Find** the cost culprit using AWS tools
|
||||||
|
2. **Fix** the issue with zero downtime
|
||||||
|
3. **Prevent** recurrence via automation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Step 1: Set Up the Crime Scene**
|
||||||
|
**Deploy the problem environment (5 minutes)**:
|
||||||
|
```bash
|
||||||
|
# Create a rogue NAT Gateway (billable item)
|
||||||
|
VPC_ID=$(aws ec2 describe-vpcs --query 'Vpcs[0].VpcId' --output text)
|
||||||
|
SUBNET_ID=$(aws ec2 describe-subnets --query 'Subnets[0].SubnetId' --output text)
|
||||||
|
AWS_REGION=$(aws configure get region)
|
||||||
|
|
||||||
|
# Deploy untagged NAT Gateway (the "phantom bill" culprit)
|
||||||
|
aws ec2 create-nat-gateway \
|
||||||
|
--subnet-id $SUBNET_ID \
|
||||||
|
--region $AWS_REGION \
|
||||||
|
--tag-specifications 'ResourceType=natgateway,Tags=[{Key=Name,Value=UNUSED_NAT}]'
|
||||||
|
|
||||||
|
# Simulate untagged dev resources (common brownfield mess)
|
||||||
|
aws ec2 run-instances \
|
||||||
|
--image-id ami-0abcdef1234567890 \
|
||||||
|
--instance-type t2.micro \
|
||||||
|
--subnet-id $SUBNET_ID \
|
||||||
|
--tag-specifications 'ResourceType=instance,Tags=[{Key=Environment,Value=dev}]'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Step 2: Investigate Like a SME**
|
||||||
|
#### **Skill 1: Cost Forensics with AWS CLI**
|
||||||
|
```bash
|
||||||
|
# Find top 5 cost drivers this month (replace dates)
|
||||||
|
aws ce get-cost-and-usage \
|
||||||
|
--time-period Start=2024-01-01,End=2024-01-31 \
|
||||||
|
--granularity MONTHLY \
|
||||||
|
--metrics "UnblendedCost" \
|
||||||
|
--group-by Type=DIMENSION,Key=SERVICE \
|
||||||
|
--query 'ResultsByTime[].Groups[?Metrics.UnblendedCost.Amount > `0`] | sort_by(@, &to_number(Metrics.UnblendedCost.Amount))[-5:]' \
|
||||||
|
--output table
|
||||||
|
```
|
||||||
|
**Expected Finding**: `AmazonVPC` costs are abnormally high.
|
||||||
|
|
||||||
|
#### **Skill 2: Packet-Level Verification**
|
||||||
|
Check if NAT Gateway is actually used:
|
||||||
|
```bash
|
||||||
|
# Get NAT Gateway IP
|
||||||
|
NAT_IP=$(aws ec2 describe-nat-gateways --query 'NatGateways[0].NatGatewayAddresses[0].PublicIp' --output text)
|
||||||
|
|
||||||
|
# Start traffic capture (run on an EC2 instance in private subnet)
|
||||||
|
sudo tcpdump -i eth0 host $NAT_IP -nn -c 10 -w nat_traffic.pcap
|
||||||
|
```
|
||||||
|
**Analysis**: No packets? NAT is unused.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Step 3: Fix & Automate**
|
||||||
|
#### **Skill 3: Zero-Downtime Remediation**
|
||||||
|
```bash
|
||||||
|
# Step 1: Tag the NAT for deletion (avoid killing active resources)
|
||||||
|
aws ec2 create-tags \
|
||||||
|
--resources $(aws ec2 describe-nat-gateways --query 'NatGateways[0].NatGatewayId' --output text) \
|
||||||
|
--tags Key=ExpirationDate,Value=$(date -d "+7 days" +%Y-%m-%d)
|
||||||
|
|
||||||
|
# Step 2: Deploy Lambda auto-cleanup (prevents future issues)
|
||||||
|
cat > lambda_function.py <<'EOF'
|
||||||
|
import boto3, datetime
|
||||||
|
def lambda_handler(event, context):
|
||||||
|
ec2 = boto3.client('ec2')
|
||||||
|
expired = ec2.describe_nat_gateways(Filters=[{
|
||||||
|
'Name': 'tag:ExpirationDate',
|
||||||
|
'Values': [datetime.datetime.now().strftime('%Y-%m-%d')]
|
||||||
|
}])
|
||||||
|
for nat in expired['NatGateways']:
|
||||||
|
ec2.delete_nat_gateway(NatGatewayId=nat['NatGatewayId'])
|
||||||
|
EOF
|
||||||
|
|
||||||
|
# Deploy Lambda (Python 3.9)
|
||||||
|
aws lambda create-function \
|
||||||
|
--function-name CleanupNATs \
|
||||||
|
--runtime python3.9 \
|
||||||
|
--handler lambda_function.lambda_handler \
|
||||||
|
--role arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/lambda-execution-role \
|
||||||
|
--zip-file fileb://lambda_function.py
|
||||||
|
```
|
||||||
|
|
||||||
|
#### **Skill 4: Tag Enforcement**
|
||||||
|
```bash
|
||||||
|
# Deploy SCP to block untagged resources
|
||||||
|
aws organizations create-policy \
|
||||||
|
--name "RequireTags" \
|
||||||
|
--description "No tags, no resources" \
|
||||||
|
--content '{
|
||||||
|
"Version": "2012-10-17",
|
||||||
|
"Statement": [{
|
||||||
|
"Effect": "Deny",
|
||||||
|
"Action": ["ec2:RunInstances", "ec2:CreateNatGateway"],
|
||||||
|
"Resource": "*",
|
||||||
|
"Condition": {
|
||||||
|
"Null": {
|
||||||
|
"aws:RequestTag/Owner": "true",
|
||||||
|
"aws:RequestTag/CostCenter": "true"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Step 4: Prove Your Value**
|
||||||
|
**Generate a Cost Savings Report**:
|
||||||
|
```bash
|
||||||
|
# Calculate savings (NAT Gateway: $0.045/hr * 24 * 30 = $32.40/month)
|
||||||
|
echo "## Monthly Savings Report" > report.md
|
||||||
|
echo "- **Deleted Unused NAT Gateway**: \$32.40/month" >> report.md
|
||||||
|
echo "- **Prevented Future Waste**: \$100+/month (estimated)" >> report.md
|
||||||
|
echo "**Total Annualized Savings**: \$1,588.80" >> report.md
|
||||||
|
|
||||||
|
# Share with leadership
|
||||||
|
aws ses send-email \
|
||||||
|
--from "you@company.com" \
|
||||||
|
--to "boss@company.com" \
|
||||||
|
--subject "Cost Optimization Results" \
|
||||||
|
--text file://report.md
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Lab Extensions (Bonus Points)**
|
||||||
|
1. **Find Cross-AZ Traffic**:
|
||||||
|
```bash
|
||||||
|
aws ce get-cost-and-usage \
|
||||||
|
--filter '{"Dimensions": {"Key": "USAGE_TYPE", "Values": ["DataTransfer-BetweenAZ"]}}'
|
||||||
|
```
|
||||||
|
2. **Set Up Budget Alarms**:
|
||||||
|
```bash
|
||||||
|
aws budgets create-budget \
|
||||||
|
--budget '{"BudgetName": "NAT-Gateway-Alert", "BudgetLimit": {"Amount": "50", "Unit": "USD"}, "CostFilters": {"Service": ["AmazonVPC"]}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Why This Lab Matters**
|
||||||
|
- **Real AWS Resources**: Uses actual billable services (but stays Free Tier friendly).
|
||||||
|
- **SME Skills Practiced**:
|
||||||
|
- Cost analysis via CLI
|
||||||
|
- Packet-level verification
|
||||||
|
- Zero-downtime fixes
|
||||||
|
- Tag governance
|
||||||
|
- **Career Impact**: These exact skills have gotten engineers promoted.
|
||||||
|
|
||||||
|
**Time to Complete**: ~30 minutes.
|
||||||
|
**Cost**: < \$0.50 (delete NAT Gateway immediately after lab).
|
||||||
|
|
||||||
|
Want a **more advanced version** with Direct Connect or hybrid cloud scenarios? Let me know!
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
Here’s the **killer skill set** that combines "boring" fundamentals with cloud-native expertise to make you the **unquestioned SME**—the one who fixes what others can’t, optimizes what others overlook, and becomes indispensable:
|
Here’s the **killer skill set** that combines "boring" fundamentals with cloud-native expertise to make you the **unquestioned SME**—the one who fixes what others can’t, optimizes what others overlook, and becomes indispensable:
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
Reference in New Issue
Block a user