Add tech_docs/linux/file_parsing.md
This commit is contained in:
77
tech_docs/linux/file_parsing.md
Normal file
77
tech_docs/linux/file_parsing.md
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
Here's a **tiered ranking** of the most useful commands/syntax for each tool, based on frequency of use and power-to-weight ratio for real-world tasks:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **S-Tier (Essential Daily Use)**
|
||||||
|
| Tool | Top 3 Must-Know Commands |
|
||||||
|
|------|--------------------------|
|
||||||
|
| **grep** | `grep -i "pattern" file`<br>`grep -r "text" /dir/`<br>`grep -v "exclude" file` |
|
||||||
|
| **awk** | `awk '{print $1}'`<br>`awk -F: '/pattern/{print $3}'`<br>`awk 'NR>1 && $3>100'` |
|
||||||
|
| **sed** | `sed 's/old/new/g'`<br>`sed '/pattern/d'`<br>`sed -i.bak '...' file` |
|
||||||
|
| **rsync** | `rsync -avz src/ dest/`<br>`rsync -avzP --delete src/ remote:dest/`<br>`rsync -av --exclude='*.tmp' src/ dest/` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **A-Tier (Power User Essentials)**
|
||||||
|
| Tool | Next-Level Commands |
|
||||||
|
|------|---------------------|
|
||||||
|
| **grep** | `grep -A3 -B2 "context"`<br>`grep -E "(foo|bar)"`<br>`grep -oP 'regex(?=lookahead)'` |
|
||||||
|
| **awk** | `awk 'BEGIN{FS=OFS=":"} {...}'`<br>`awk '{a[$1]++} END{for(k in a) print k,a[k]}'`<br>`awk '!seen[$0]++'` (dedup) |
|
||||||
|
| **sed** | `sed -n '10,20p'` (range)<br>`sed '/start/,/end/d'`<br>`sed -e '1i\header' -e '$a\footer'` |
|
||||||
|
| **rsync** | `rsync -av --link-dest=/prev/ src/ dest/` (hardlink backups)<br>`rsync -av --bwlimit=5000` (throttle)<br>`rsync -av --files-from=list.txt / remote:/` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **B-Tier (Specialized but Powerful)**
|
||||||
|
| Tool | Niche Superpowers |
|
||||||
|
|-------|-------------------|
|
||||||
|
| **grep** | `grep --color=always \| less -R`<br>`grep -f patterns.txt`<br>`grep -Z \| xargs -0` (null-safe) |
|
||||||
|
| **awk** | `awk -v var=value '...'`<br>`awk '{system("cmd " $1)}'`<br>`awk '@include "lib.awk"'` |
|
||||||
|
| **sed** | `sed ':a;N;$!ba;s/\n/,/g'` (slurp+replace)<br>`sed '1~2d'` (every 2nd line)<br>`sed -E 's/(groups)/\U\1/'` (case conv) |
|
||||||
|
| **rsync** | `rsync --daemon` (server mode)<br>`rsync -av --xattrs --acls` (metadata)<br>`rsync -av --compare-dest=/compare/ src/ dest/` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **C-Tier (Rare but Life-Saving)**
|
||||||
|
| Tool | Expert Tricks |
|
||||||
|
|------|--------------|
|
||||||
|
| **grep** | `grep -P '(?<=lookbehind)'`<br>`grep --binary-files=text`<br>`grep -m 100` (stop after N matches) |
|
||||||
|
| **awk** | `awk 'BEGIN{RS=""; FS="\n"}` (paragraph mode)<br>`awk -M 'BEGIN{print 2^100}'` (big math)<br>`awk -i inplace` (GNU awk 4.1+) |
|
||||||
|
| **sed** | `sed 's/.*/\L&/'` (lowercase)<br>`sed '/pattern/{x;p;x;}'` (hold space)<br>`sed -f script.sed` |
|
||||||
|
| **rsync** | `rsync --partial-dir=.rsync-partial`<br>`rsync --sockopts='SO_RCVBUF=65536'`<br>`rsync --remote-option='-T /tmp'` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Tool-Specific Tier Explanations**
|
||||||
|
|
||||||
|
**grep**
|
||||||
|
- *S-Tier*: Covers 90% of search needs
|
||||||
|
- *A-Tier*: Context and advanced regex
|
||||||
|
- *B/C-Tier*: Rare flags for binary/encoding edge cases
|
||||||
|
|
||||||
|
**awk**
|
||||||
|
- *S-Tier*: Field processing covers most ETL tasks
|
||||||
|
- *A-Tier*: Arrays and stats for log analysis
|
||||||
|
- *B/C-Tier*: Rarely needed math/IO extensions
|
||||||
|
|
||||||
|
**sed**
|
||||||
|
- *S-Tier*: Basic substitutions dominate usage
|
||||||
|
- *A-Tier*: Range operations for config files
|
||||||
|
- *B/C-Tier*: Hold space and branching are niche
|
||||||
|
|
||||||
|
**rsync**
|
||||||
|
- *S-Tier*: Basic sync covers most backups
|
||||||
|
- *A-Tier*: Delta transfers for large datasets
|
||||||
|
- *B/C-Tier*: Daemon mode for enterprise
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **When to Reach for Each Tool**
|
||||||
|
1. **Quick text search?** → `grep`
|
||||||
|
2. **Columnar data?** → `awk`
|
||||||
|
3. **Find/replace?** → `sed`
|
||||||
|
4. **File sync?** → `rsync`
|
||||||
|
|
||||||
|
**Pro Tip:** Combine them:
|
||||||
|
```bash
|
||||||
|
grep "error" logs/ | awk '{print $3}' | sort | uniq -c | sed 's/^ *//' > counts.txt
|
||||||
Reference in New Issue
Block a user