Add tech_docs/linux/awk.md
This commit is contained in:
246
tech_docs/linux/awk.md
Normal file
246
tech_docs/linux/awk.md
Normal file
@@ -0,0 +1,246 @@
|
||||
# **The Ultimate AWK Guide: From Basics to Advanced Data Wrangling**
|
||||
|
||||
## **Table of Contents**
|
||||
1. [AWK Fundamentals](#1-awk-fundamentals)
|
||||
2. [Patterns & Actions](#2-patterns--actions)
|
||||
3. [Built-in Variables](#3-built-in-variables)
|
||||
4. [Arrays & Data Structures](#4-arrays--data-structures)
|
||||
5. [Control Flow](#5-control-flow)
|
||||
6. [Functions & Math](#6-functions--math)
|
||||
7. [Advanced Text Processing](#7-advanced-text-processing)
|
||||
8. [Real-World Recipes](#8-real-world-recipes)
|
||||
9. [Performance & Limitations](#9-performance--limitations)
|
||||
|
||||
---
|
||||
|
||||
## **1. AWK Fundamentals**
|
||||
|
||||
### **Core Syntax**
|
||||
```bash
|
||||
awk [OPTIONS] 'PATTERN {ACTION}' file.txt
|
||||
```
|
||||
|
||||
### **Basic Structure**
|
||||
```awk
|
||||
BEGIN { /* pre-processing */ }
|
||||
PATTERN { /* line processing */ }
|
||||
END { /* post-processing */ }
|
||||
```
|
||||
|
||||
### **Common Flags**
|
||||
```bash
|
||||
awk -F: '{print $1}' /etc/passwd # Set field separator
|
||||
awk -v var=value '...' # Pass variables
|
||||
awk -f script.awk file.txt # Use script file
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **2. Patterns & Actions**
|
||||
|
||||
### **Pattern Types**
|
||||
```awk
|
||||
/error/ {print} # Regex match
|
||||
$3 > 100 {print $1} # Field comparison
|
||||
NR == 1 {print} # Line number
|
||||
BEGINFILE {print "Processing:", FILENAME} # Per-file
|
||||
```
|
||||
|
||||
### **Special Patterns**
|
||||
```awk
|
||||
BEGIN {FS=":"; OFS="\t"} # Set input/output separators
|
||||
END {print "Total lines:", NR} # Final processing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **3. Built-in Variables**
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `NR` | Current record number |
|
||||
| `NF` | Number of fields |
|
||||
| `FS` | Field separator (default: whitespace) |
|
||||
| `OFS` | Output field separator |
|
||||
| `FILENAME` | Current file name |
|
||||
| `FNR` | Record number per file |
|
||||
|
||||
### **Example**
|
||||
```awk
|
||||
awk '{print NR, NF, $0}' file.txt # Show line stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **4. Arrays & Data Structures**
|
||||
|
||||
### **Associative Arrays**
|
||||
```awk
|
||||
{count[$1]++} # Count occurrences
|
||||
END {for (key in count) print key, count[key]}
|
||||
```
|
||||
|
||||
### **Multi-Dimensional Arrays**
|
||||
```awk
|
||||
{array[$1,$2] = $3} # Fake multi-dim
|
||||
```
|
||||
|
||||
### **Array Functions**
|
||||
```awk
|
||||
split(string, array, separator)
|
||||
asort(array) # Sort by value
|
||||
asorti(array) # Sort by index
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **5. Control Flow**
|
||||
|
||||
### **Conditionals**
|
||||
```awk
|
||||
{if ($3 > 100) print "High:", $1
|
||||
else print "Low:", $1}
|
||||
```
|
||||
|
||||
### **Loops**
|
||||
```awk
|
||||
for (i=1; i<=NF; i++) {print $i} # Fields
|
||||
for (key in array) {print key} # Array keys
|
||||
```
|
||||
|
||||
### **Switch/Case**
|
||||
```awk
|
||||
switch($1) {
|
||||
case "foo": print "Found foo"; break
|
||||
case /^bar/: print "Starts with bar"; break
|
||||
default: print "Other"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **6. Functions & Math**
|
||||
|
||||
### **Built-in Functions**
|
||||
```awk
|
||||
length($0) # String length
|
||||
sub(/old/, "new", $1) # In-field substitution
|
||||
system("date") # Run shell command
|
||||
```
|
||||
|
||||
### **Math Operations**
|
||||
```awk
|
||||
{sum += $3; sumsq += ($3)^2}
|
||||
END {print "Mean:", sum/NR, "Std Dev:", sqrt(sumsq/NR - (sum/NR)^2)}
|
||||
```
|
||||
|
||||
### **User Functions**
|
||||
```awk
|
||||
function double(x) {return x*2}
|
||||
{d = double($1); print d}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **7. Advanced Text Processing**
|
||||
|
||||
### **Field Manipulation**
|
||||
```awk
|
||||
{$1 = toupper($1); $NF = $NF "%"} # Modify fields
|
||||
```
|
||||
|
||||
### **Multi-Line Records**
|
||||
```bash
|
||||
awk -v RS="" '{print $1}' # Paragraph mode
|
||||
```
|
||||
|
||||
### **CSV Processing**
|
||||
```bash
|
||||
awk -FPAT='([^,]+)|("[^"]+")' '{print $2}' data.csv
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **8. Real-World Recipes**
|
||||
|
||||
### **Log Analysis**
|
||||
```awk
|
||||
# Top 10 frequent IPs in access.log
|
||||
awk '{ip[$1]++} END {for (i in ip) print ip[i], i}' access.log | sort -nr | head
|
||||
```
|
||||
|
||||
### **Data Transformation**
|
||||
```awk
|
||||
# Convert TSV to CSV
|
||||
BEGIN {FS="\t"; OFS=","} {$1=$1; print}
|
||||
```
|
||||
|
||||
### **Column Statistics**
|
||||
```awk
|
||||
# Compute column averages
|
||||
NR>1 {for(i=1; i<=NF; i++) sum[i]+=$i}
|
||||
END {for(i=1; i<=NF; i++) print "Col", i, "avg:", sum[i]/(NR-1)}
|
||||
```
|
||||
|
||||
### **JSON Generation**
|
||||
```awk
|
||||
BEGIN {print "["; FS=","}
|
||||
{printf " {\"name\":\"%s\",\"value\":%s}%s\n", $1, $2, (NR==FNR?",":"")}
|
||||
END {print "]"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **9. Performance & Limitations**
|
||||
|
||||
### **Optimization Tips**
|
||||
```bash
|
||||
LC_ALL=C awk ... # 2-3x speedup for ASCII
|
||||
mawk (faster alternative to gawk) # For large datasets
|
||||
```
|
||||
|
||||
### **When Not to Use AWK**
|
||||
- Binary data processing
|
||||
- Complex nested data structures (use `jq` for JSON)
|
||||
- Multi-gigabyte files (consider `split + parallel`)
|
||||
|
||||
### **AWK vs Alternatives**
|
||||
| Task | Best Tool |
|
||||
|------|-----------|
|
||||
| Columnar data | AWK |
|
||||
| JSON/XML | `jq`/`xq` |
|
||||
| Complex stats | R/Python |
|
||||
| Multi-file joins | SQLite |
|
||||
|
||||
---
|
||||
|
||||
## **Pro Techniques**
|
||||
|
||||
### **Self-Contained Scripts**
|
||||
```bash
|
||||
#!/usr/bin/awk -f
|
||||
BEGIN {print "Starting processing"}
|
||||
/pattern/ {count++}
|
||||
END {print "Found", count, "matches"}
|
||||
```
|
||||
|
||||
### **Two-File Processing**
|
||||
```awk
|
||||
# Join two files on first field
|
||||
NR==FNR {a[$1]=$2; next}
|
||||
$1 in a {print $0, a[$1]}
|
||||
```
|
||||
|
||||
### **Bit Manipulation**
|
||||
```awk
|
||||
function is_set(x,bit) {return and(x, lshift(1, bit-1))}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Further Learning**
|
||||
- **Books**: "Effective AWK Programming" (GNU AWK manual)
|
||||
- **Cheat Sheets**: [awkcheatsheet.com](https://awkcheatsheet.com)
|
||||
- **Practice**: [exercism.org/tracks/awk](https://exercism.org/tracks/awk)
|
||||
|
||||
**Need an AWK solution?** Describe your data format and desired transformation!
|
||||
Reference in New Issue
Block a user