Files
the_information_nexus/tech_docs/linux/file_parsing.md

3.3 KiB

Here's a tiered ranking of the most useful commands/syntax for each tool, based on frequency of use and power-to-weight ratio for real-world tasks:


S-Tier (Essential Daily Use)

Tool Top 3 Must-Know Commands
grep grep -i "pattern" file
grep -r "text" /dir/
grep -v "exclude" file
awk awk '{print $1}'
awk -F: '/pattern/{print $3}'
awk 'NR>1 && $3>100'
sed sed 's/old/new/g'
sed '/pattern/d'
sed -i.bak '...' file
rsync rsync -avz src/ dest/
rsync -avzP --delete src/ remote:dest/
rsync -av --exclude='*.tmp' src/ dest/

A-Tier (Power User Essentials)

Tool Next-Level Commands
grep grep -A3 -B2 "context"
`grep -E "(foo
awk awk 'BEGIN{FS=OFS=":"} {...}'
awk '{a[$1]++} END{for(k in a) print k,a[k]}'
awk '!seen[$0]++' (dedup)
sed sed -n '10,20p' (range)
sed '/start/,/end/d'
sed -e '1i\header' -e '$a\footer'
rsync rsync -av --link-dest=/prev/ src/ dest/ (hardlink backups)
rsync -av --bwlimit=5000 (throttle)
rsync -av --files-from=list.txt / remote:/

B-Tier (Specialized but Powerful)

Tool Niche Superpowers
grep grep --color=always | less -R
grep -f patterns.txt
grep -Z | xargs -0 (null-safe)
awk awk -v var=value '...'
awk '{system("cmd " $1)}'
awk '@include "lib.awk"'
sed sed ':a;N;$!ba;s/\n/,/g' (slurp+replace)
sed '1~2d' (every 2nd line)
sed -E 's/(groups)/\U\1/' (case conv)
rsync rsync --daemon (server mode)
rsync -av --xattrs --acls (metadata)
rsync -av --compare-dest=/compare/ src/ dest/

C-Tier (Rare but Life-Saving)

Tool Expert Tricks
grep grep -P '(?<=lookbehind)'
grep --binary-files=text
grep -m 100 (stop after N matches)
awk awk 'BEGIN{RS=""; FS="\n"} (paragraph mode)
awk -M 'BEGIN{print 2^100}' (big math)
awk -i inplace (GNU awk 4.1+)
sed sed 's/.*/\L&/' (lowercase)
sed '/pattern/{x;p;x;}' (hold space)
sed -f script.sed
rsync rsync --partial-dir=.rsync-partial
rsync --sockopts='SO_RCVBUF=65536'
rsync --remote-option='-T /tmp'

Tool-Specific Tier Explanations

grep

  • S-Tier: Covers 90% of search needs
  • A-Tier: Context and advanced regex
  • B/C-Tier: Rare flags for binary/encoding edge cases

awk

  • S-Tier: Field processing covers most ETL tasks
  • A-Tier: Arrays and stats for log analysis
  • B/C-Tier: Rarely needed math/IO extensions

sed

  • S-Tier: Basic substitutions dominate usage
  • A-Tier: Range operations for config files
  • B/C-Tier: Hold space and branching are niche

rsync

  • S-Tier: Basic sync covers most backups
  • A-Tier: Delta transfers for large datasets
  • B/C-Tier: Daemon mode for enterprise

When to Reach for Each Tool

  1. Quick text search?grep
  2. Columnar data?awk
  3. Find/replace?sed
  4. File sync?rsync

Pro Tip: Combine them:

grep "error" logs/ | awk '{print $3}' | sort | uniq -c | sed 's/^ *//' > counts.txt