From 22959c10901db5581fd1a30fa2d8d9ca5e62806c Mon Sep 17 00:00:00 2001 From: medusa Date: Sun, 20 Jul 2025 21:27:15 -0500 Subject: [PATCH] Update tech_docs/cloud/aws_notes.md --- tech_docs/cloud/aws_notes.md | 171 +++++++++++++++++++++++++++++++++++ 1 file changed, 171 insertions(+) diff --git a/tech_docs/cloud/aws_notes.md b/tech_docs/cloud/aws_notes.md index 01a67b6..8a9aa34 100644 --- a/tech_docs/cloud/aws_notes.md +++ b/tech_docs/cloud/aws_notes.md @@ -1,3 +1,174 @@ +Here’s the **killer skill set** that combines "boring" fundamentals with cloud-native expertise to make you the **unquestioned SME**—the one who fixes what others can’t, optimizes what others overlook, and becomes indispensable: + +--- + +### **1. The "Boring" Fundamentals That Make You Dangerous** +#### **A. Packet-Level Kung Fu** +- **Mastery**: `tcpdump`, `Wireshark`, `mtr` +- **Cloud Application**: + - Diagnose HTTPS handshake failures between ALB and EC2 when Security Groups "look fine." + - Prove MTU issues causing packet drops in VPN tunnels. + **Pro Move**: + ```bash + # Capture TLS handshakes to prove cert mismatches + sudo tcpdump -i eth0 'tcp port 443 and (tcp-syn|tcp-ack)!=0' -XX -w tls.pcap + ``` + +#### **B. DNS & Routing Wizardry** +- **Mastery**: `dig`, `route tables`, BGP +- **Cloud Application**: + - Explain why PrivateLink endpoints resolve but don’t connect (spoiler: missing Route53 private zone associations). + - Fix Direct Connect flapping by adjusting BGP timers (`keepalive=10`, `hold=30`). + **Pro Move**: + ```bash + # Find DNS leaks in hybrid cloud + dig +short myapp.internal | grep -v '10\.' # Non-RFC1918 responses = bad + ``` + +--- + +### **2. Cloud-Native Cost Surgery** +#### **A. Billable Event Forensics** +- **Mastery**: AWS Cost Explorer, CUR, OpenCost +- **Cloud Application**: + - Trace a $15k/month spike to orphaned NAT Gateways in unused AZs. + - Prove dev teams are routing traffic cross-AZ ($$$) when same-AZ paths exist. + **Pro Move**: + ```sql + -- Find cross-AZ traffic in CUR + SELECT line_item_usage_type, SUM(line_item_unblended_cost) + FROM aws_cur + WHERE line_item_usage_type LIKE '%DataTransfer-BetweenAZ%' + GROUP BY 1; + ``` + +#### **B. Tagging Dictatorship** +- **Mastery**: AWS SCPs, AWS Config, Resource Groups +- **Cloud Application**: + - Force 100% tagging compliance by denying untagged resource creation. + - Automatically nuke resources with `ExpirationDate=2023-12-31`. + **Pro Move**: + ```bash + # Find untagged resources costing >$500/month + aws ce get-cost-and-usage \ + --time-period Start=2024-01-01,End=2024-01-31 \ + --filter '{"Not": {"Dimensions": {"Key": "ResourceTags:Owner", "Values": ["*"]}}}' + ``` + +--- + +### **3. Hybrid Cloud Debugging** +#### **A. VPN/DC Troubleshooting** +- **Mastery**: `ping -s`, `aws directconnect describe-virtual-interfaces` +- **Cloud Application**: + - Prove on-prem firewall drops AWS’s ICMP fragmentation needed packets (MTU 1500). + - Diagnose BGP route flapping with `route -n` and AWS CLI. + **Pro Move**: + ```bash + # Test MTU end-to-end (AWS → on-prem) + ping -M do -s 1472 10.1.1.1 # 1472 + 28 = 1500 bytes + ``` + +#### **B. Traffic Mirroring + IDS** +- **Mastery**: `tcpdump`, Zeek, Suricata +- **Cloud Application**: + - Mirror suspicious ENI traffic to a security VPC for analysis. + - Detect cryptojacking via anomalous outbound connections. + **Pro Move**: + ```bash + # Mirror traffic to a security appliance + aws ec2 create-traffic-mirror-target --network-interface-id eni-123abc + ``` + +--- + +### **4. Automation That Scares People** +#### **A. CLI-Fu** +- **Mastery**: AWS CLI + `jq` + `xargs` +- **Cloud Application**: + - One-liner to delete all untagged EBS volumes older than 30 days: + ```bash + aws ec2 describe-volumes \ + --query 'Volumes[?Tags==null && CreateTime<`2024-01-01`].VolumeId' \ + --output text | xargs -I {} aws ec2 delete-volume --volume-id {} + ``` + +#### **B. Terraform Modules for Zero-Downtime Changes** +- **Mastery**: `create_before_destroy`, `count` +- **Cloud Application**: + - Swap NACLs without dropping connections: + ```hcl + resource "aws_network_acl_rule" "new" { + lifecycle { create_before_destroy = true } + } + ``` + +--- + +### **5. The "Soft" Skills That Lock In Your Authority** +#### **A. Cost Attribution Storytelling** +- **Mastery**: PowerBI/Grafana dashboards showing cost by team +- **Pro Move**: + ```sql + -- PowerBI Query for Team Accountability + SELECT + resource_tags_user_team, + SUM(line_item_unblended_cost) AS cost + FROM aws_cur + WHERE line_item_product_code = 'AmazonVPC' + GROUP BY 1 + ORDER BY cost DESC + ``` + +#### **B. Post-Mortem Brutal Honesty** +- **Template**: + ```markdown + ## Root Cause: + Untagged NAT Gateway left running in us-west-2 ($1,200 wasted). + ## Fix: + SCP enforcing `Owner` tag on all EC2 resources. + ## Savings: + $14,400/year. + ``` + +--- + +### **Why This Works When Others Fail** +1. **You Speak Packet-Level Truth**: When the "cloud-native" team says "the Security Groups are open," you show the `tcpdump` proving RST packets. +2. **You Attribute Costs Ruthlessly**: Finance teams will love you when you prove Team X caused a $50k spike. +3. **You Automate the Pain Away**: Your scripts make you look like a wizard. + +--- + +### **Skill Acquisition Roadmap** +1. **Week 1-2**: Master `tcpdump` + VPC Flow Logs. +2. **Week 3-4**: Build a Cost Dashboard with CUR. +3. **Week 5-6**: Enforce Tagging with SCPs. +4. **Week 7+**: Automate NACL/SG changes with Terraform. + +--- + +### **Final Boss Move** +Deploy this **NAT Gateway Hunter-Killer Lambda**: +```python +import boto3 +def lambda_handler(event, context): + ec2 = boto3.client('ec2') + idle_nats = ec2.describe_nat_gateways(Filters=[ + {'Name': 'state', 'Values': ['available']}, + {'Name': 'tag:LastUsed', 'Values': ['<90days-ago>']} + ]) + for nat in idle_nats['NatGateways']: + ec2.delete_nat_gateway(NatGatewayId=nat['NatGatewayId']) +``` +*(Saves $1,000/month per idle NAT Gateway)* + +--- + +You’re not just another cloud engineer—you’re the **cloud network surgeon** who cuts costs, fixes outages, and owns the untouchable skills. Want me to drill into a specific skill with a hands-on lab? + +--- + Here’s a **FinOps-focused battle plan** to master cloud cost optimization, with specific AWS billable events to hunt down and tools to control them—ensuring your salary stays funded by savings you generate: ---