Add tech_docs/linux/file_parsing.md

This commit is contained in:
2025-07-01 12:56:59 +00:00
parent 318a69faea
commit 78079c9818

View File

@@ -0,0 +1,77 @@
Here's a **tiered ranking** of the most useful commands/syntax for each tool, based on frequency of use and power-to-weight ratio for real-world tasks:
---
### **S-Tier (Essential Daily Use)**
| Tool | Top 3 Must-Know Commands |
|------|--------------------------|
| **grep** | `grep -i "pattern" file`<br>`grep -r "text" /dir/`<br>`grep -v "exclude" file` |
| **awk** | `awk '{print $1}'`<br>`awk -F: '/pattern/{print $3}'`<br>`awk 'NR>1 && $3>100'` |
| **sed** | `sed 's/old/new/g'`<br>`sed '/pattern/d'`<br>`sed -i.bak '...' file` |
| **rsync** | `rsync -avz src/ dest/`<br>`rsync -avzP --delete src/ remote:dest/`<br>`rsync -av --exclude='*.tmp' src/ dest/` |
---
### **A-Tier (Power User Essentials)**
| Tool | Next-Level Commands |
|------|---------------------|
| **grep** | `grep -A3 -B2 "context"`<br>`grep -E "(foo|bar)"`<br>`grep -oP 'regex(?=lookahead)'` |
| **awk** | `awk 'BEGIN{FS=OFS=":"} {...}'`<br>`awk '{a[$1]++} END{for(k in a) print k,a[k]}'`<br>`awk '!seen[$0]++'` (dedup) |
| **sed** | `sed -n '10,20p'` (range)<br>`sed '/start/,/end/d'`<br>`sed -e '1i\header' -e '$a\footer'` |
| **rsync** | `rsync -av --link-dest=/prev/ src/ dest/` (hardlink backups)<br>`rsync -av --bwlimit=5000` (throttle)<br>`rsync -av --files-from=list.txt / remote:/` |
---
### **B-Tier (Specialized but Powerful)**
| Tool | Niche Superpowers |
|-------|-------------------|
| **grep** | `grep --color=always \| less -R`<br>`grep -f patterns.txt`<br>`grep -Z \| xargs -0` (null-safe) |
| **awk** | `awk -v var=value '...'`<br>`awk '{system("cmd " $1)}'`<br>`awk '@include "lib.awk"'` |
| **sed** | `sed ':a;N;$!ba;s/\n/,/g'` (slurp+replace)<br>`sed '1~2d'` (every 2nd line)<br>`sed -E 's/(groups)/\U\1/'` (case conv) |
| **rsync** | `rsync --daemon` (server mode)<br>`rsync -av --xattrs --acls` (metadata)<br>`rsync -av --compare-dest=/compare/ src/ dest/` |
---
### **C-Tier (Rare but Life-Saving)**
| Tool | Expert Tricks |
|------|--------------|
| **grep** | `grep -P '(?<=lookbehind)'`<br>`grep --binary-files=text`<br>`grep -m 100` (stop after N matches) |
| **awk** | `awk 'BEGIN{RS=""; FS="\n"}` (paragraph mode)<br>`awk -M 'BEGIN{print 2^100}'` (big math)<br>`awk -i inplace` (GNU awk 4.1+) |
| **sed** | `sed 's/.*/\L&/'` (lowercase)<br>`sed '/pattern/{x;p;x;}'` (hold space)<br>`sed -f script.sed` |
| **rsync** | `rsync --partial-dir=.rsync-partial`<br>`rsync --sockopts='SO_RCVBUF=65536'`<br>`rsync --remote-option='-T /tmp'` |
---
### **Tool-Specific Tier Explanations**
**grep**
- *S-Tier*: Covers 90% of search needs
- *A-Tier*: Context and advanced regex
- *B/C-Tier*: Rare flags for binary/encoding edge cases
**awk**
- *S-Tier*: Field processing covers most ETL tasks
- *A-Tier*: Arrays and stats for log analysis
- *B/C-Tier*: Rarely needed math/IO extensions
**sed**
- *S-Tier*: Basic substitutions dominate usage
- *A-Tier*: Range operations for config files
- *B/C-Tier*: Hold space and branching are niche
**rsync**
- *S-Tier*: Basic sync covers most backups
- *A-Tier*: Delta transfers for large datasets
- *B/C-Tier*: Daemon mode for enterprise
---
### **When to Reach for Each Tool**
1. **Quick text search?**`grep`
2. **Columnar data?**`awk`
3. **Find/replace?**`sed`
4. **File sync?**`rsync`
**Pro Tip:** Combine them:
```bash
grep "error" logs/ | awk '{print $3}' | sort | uniq -c | sed 's/^ *//' > counts.txt