diff --git a/tech_docs/linux/file_parsing.md b/tech_docs/linux/file_parsing.md new file mode 100644 index 0000000..aec6a70 --- /dev/null +++ b/tech_docs/linux/file_parsing.md @@ -0,0 +1,77 @@ +Here's a **tiered ranking** of the most useful commands/syntax for each tool, based on frequency of use and power-to-weight ratio for real-world tasks: + +--- + +### **S-Tier (Essential Daily Use)** +| Tool | Top 3 Must-Know Commands | +|------|--------------------------| +| **grep** | `grep -i "pattern" file`
`grep -r "text" /dir/`
`grep -v "exclude" file` | +| **awk** | `awk '{print $1}'`
`awk -F: '/pattern/{print $3}'`
`awk 'NR>1 && $3>100'` | +| **sed** | `sed 's/old/new/g'`
`sed '/pattern/d'`
`sed -i.bak '...' file` | +| **rsync** | `rsync -avz src/ dest/`
`rsync -avzP --delete src/ remote:dest/`
`rsync -av --exclude='*.tmp' src/ dest/` | + +--- + +### **A-Tier (Power User Essentials)** +| Tool | Next-Level Commands | +|------|---------------------| +| **grep** | `grep -A3 -B2 "context"`
`grep -E "(foo|bar)"`
`grep -oP 'regex(?=lookahead)'` | +| **awk** | `awk 'BEGIN{FS=OFS=":"} {...}'`
`awk '{a[$1]++} END{for(k in a) print k,a[k]}'`
`awk '!seen[$0]++'` (dedup) | +| **sed** | `sed -n '10,20p'` (range)
`sed '/start/,/end/d'`
`sed -e '1i\header' -e '$a\footer'` | +| **rsync** | `rsync -av --link-dest=/prev/ src/ dest/` (hardlink backups)
`rsync -av --bwlimit=5000` (throttle)
`rsync -av --files-from=list.txt / remote:/` | + +--- + +### **B-Tier (Specialized but Powerful)** +| Tool | Niche Superpowers | +|-------|-------------------| +| **grep** | `grep --color=always \| less -R`
`grep -f patterns.txt`
`grep -Z \| xargs -0` (null-safe) | +| **awk** | `awk -v var=value '...'`
`awk '{system("cmd " $1)}'`
`awk '@include "lib.awk"'` | +| **sed** | `sed ':a;N;$!ba;s/\n/,/g'` (slurp+replace)
`sed '1~2d'` (every 2nd line)
`sed -E 's/(groups)/\U\1/'` (case conv) | +| **rsync** | `rsync --daemon` (server mode)
`rsync -av --xattrs --acls` (metadata)
`rsync -av --compare-dest=/compare/ src/ dest/` | + +--- + +### **C-Tier (Rare but Life-Saving)** +| Tool | Expert Tricks | +|------|--------------| +| **grep** | `grep -P '(?<=lookbehind)'`
`grep --binary-files=text`
`grep -m 100` (stop after N matches) | +| **awk** | `awk 'BEGIN{RS=""; FS="\n"}` (paragraph mode)
`awk -M 'BEGIN{print 2^100}'` (big math)
`awk -i inplace` (GNU awk 4.1+) | +| **sed** | `sed 's/.*/\L&/'` (lowercase)
`sed '/pattern/{x;p;x;}'` (hold space)
`sed -f script.sed` | +| **rsync** | `rsync --partial-dir=.rsync-partial`
`rsync --sockopts='SO_RCVBUF=65536'`
`rsync --remote-option='-T /tmp'` | + +--- + +### **Tool-Specific Tier Explanations** + +**grep** +- *S-Tier*: Covers 90% of search needs +- *A-Tier*: Context and advanced regex +- *B/C-Tier*: Rare flags for binary/encoding edge cases + +**awk** +- *S-Tier*: Field processing covers most ETL tasks +- *A-Tier*: Arrays and stats for log analysis +- *B/C-Tier*: Rarely needed math/IO extensions + +**sed** +- *S-Tier*: Basic substitutions dominate usage +- *A-Tier*: Range operations for config files +- *B/C-Tier*: Hold space and branching are niche + +**rsync** +- *S-Tier*: Basic sync covers most backups +- *A-Tier*: Delta transfers for large datasets +- *B/C-Tier*: Daemon mode for enterprise + +--- + +### **When to Reach for Each Tool** +1. **Quick text search?** → `grep` +2. **Columnar data?** → `awk` +3. **Find/replace?** → `sed` +4. **File sync?** → `rsync` + +**Pro Tip:** Combine them: +```bash +grep "error" logs/ | awk '{print $3}' | sort | uniq -c | sed 's/^ *//' > counts.txt \ No newline at end of file