Update tech_docs/linux/jq.md
This commit is contained in:
@@ -1,3 +1,227 @@
|
||||
# Understanding JSON and How `jq` Works Under the Hood
|
||||
|
||||
## What is JSON?
|
||||
|
||||
JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write, and easy for machines to parse and generate. It's built on two universal data structures:
|
||||
|
||||
1. **Collections of name/value pairs** (called objects, dictionaries, or hashes in various languages)
|
||||
2. **Ordered lists of values** (called arrays or lists)
|
||||
|
||||
### JSON Syntax Basics:
|
||||
```json
|
||||
{
|
||||
"string": "value",
|
||||
"number": 42,
|
||||
"boolean": true,
|
||||
"null": null,
|
||||
"array": [1, 2, 3],
|
||||
"object": {
|
||||
"nested": "property"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## How `jq` Processes JSON
|
||||
|
||||
### 1. Lexical Analysis (Tokenization)
|
||||
When you run `jq`, it first breaks down the JSON input into tokens:
|
||||
- Punctuation: `{ } [ ] , :`
|
||||
- Strings (in quotes)
|
||||
- Numbers
|
||||
- Keywords: `true`, `false`, `null`
|
||||
|
||||
### 2. Parsing
|
||||
The tokens are then parsed into an Abstract Syntax Tree (AST) representing the JSON structure. This tree maintains:
|
||||
- Object hierarchies
|
||||
- Array orders
|
||||
- Value types
|
||||
|
||||
### 3. Processing Pipeline
|
||||
`jq` works with a filter pipeline concept where:
|
||||
- Input JSON is parsed into a stream of JSON values
|
||||
- Each filter in your `jq` expression processes this stream
|
||||
- The output of one filter becomes the input to the next
|
||||
|
||||
### 4. Key Components Under the Hood:
|
||||
- **Iterator Model**: `jq` processes JSON in a streaming fashion, handling one element at a time
|
||||
- **Lazy Evaluation**: Expressions are only computed when needed
|
||||
- **Pattern Matching**: The engine efficiently matches patterns against JSON structures
|
||||
- **C Implementation**: Being written in C makes it fast at processing large JSON files
|
||||
|
||||
## How `jq` Filters Work
|
||||
|
||||
When you write a filter like `.users[].name`:
|
||||
|
||||
1. `.` - Takes the entire input
|
||||
2. `.users` - Selects the "users" property
|
||||
3. `[]` - Iterates over the array elements
|
||||
4. `.name` - Extracts the "name" property from each
|
||||
|
||||
### Memory Management
|
||||
`jq` is designed to:
|
||||
- Handle large JSON documents efficiently
|
||||
- Process streams of JSON objects without loading everything into memory
|
||||
- Use tail-call optimization for recursive operations
|
||||
|
||||
## Advanced Internal Concepts
|
||||
|
||||
### 1. The jq Virtual Machine
|
||||
`jq` actually compiles your filters to bytecode that runs on a custom virtual machine. This:
|
||||
- Enables complex transformations
|
||||
- Provides sandboxing for security
|
||||
- Allows optimization of common operations
|
||||
|
||||
### 2. Path Expressions
|
||||
When you use path expressions like `.a.b.c`, `jq`:
|
||||
- Navigates the JSON tree structure
|
||||
- Handles missing properties gracefully
|
||||
- Maintains context for relative paths
|
||||
|
||||
### 3. Function Composition
|
||||
Many `jq` operations are built from primitive functions that:
|
||||
- Can be composed together
|
||||
- Follow consistent patterns
|
||||
- Maintain immutability (original JSON is never modified)
|
||||
|
||||
## Example: What Happens When You Run `jq '.users[].name'`
|
||||
|
||||
1. Input JSON is tokenized and parsed into memory as a tree structure
|
||||
2. The root (`.`) is identified
|
||||
3. The `users` property is located in the object
|
||||
4. The array iterator (`[]`) starts processing each element
|
||||
5. For each array element, the `name` property is extracted
|
||||
6. Results are collected and output as a stream of values
|
||||
|
||||
Would you like me to dive deeper into any specific aspect of JSON or `jq`'s internals?
|
||||
|
||||
---
|
||||
|
||||
# **Understanding JSON & jq: The Core Fundamentals**
|
||||
|
||||
Let’s break this down into **clear, foundational concepts** so you truly *get* how JSON works and how `jq` processes it.
|
||||
|
||||
---
|
||||
|
||||
## **1. JSON Basics: The Building Blocks**
|
||||
JSON (**J**ava**S**cript **O**bject **N**otation) is a structured way to represent data. Think of it like a nested combination of **dictionaries** (key-value pairs) and **lists** (ordered sequences).
|
||||
|
||||
### **Key JSON Structures**
|
||||
| Structure | Example | Description |
|
||||
|------------|---------|-------------|
|
||||
| **Object** (Dictionary) | `{"name": "Alice", "age": 30}` | Unordered `key:value` pairs (like a Python `dict` or JS object) |
|
||||
| **Array** (List) | `[1, 2, 3, "hello"]` | Ordered list of values (like a Python `list` or JS array) |
|
||||
| **String** | `"hello"` | Text in double quotes |
|
||||
| **Number** | `42`, `3.14` | Integers or decimals |
|
||||
| **Boolean** | `true`, `false` | Logical true/false |
|
||||
| **Null** | `null` | Represents "no value" |
|
||||
|
||||
### **Example JSON Document**
|
||||
```json
|
||||
{
|
||||
"name": "Alice",
|
||||
"age": 30,
|
||||
"is_student": false,
|
||||
"courses": ["Math", "Science"],
|
||||
"address": {
|
||||
"street": "123 Main St",
|
||||
"city": "Boston"
|
||||
}
|
||||
}
|
||||
```
|
||||
- **Top-level object** (`{ ... }`) containing keys like `"name"`, `"age"`, etc.
|
||||
- **Nested structures**: `"address"` is an object inside the main object.
|
||||
- **Arrays**: `"courses"` holds a list of strings.
|
||||
|
||||
---
|
||||
|
||||
## **2. How `jq` Processes JSON**
|
||||
`jq` is a **filter** that takes JSON input, applies transformations, and produces JSON output.
|
||||
|
||||
### **Core jq Concepts**
|
||||
1. **`.` (Dot Operator)** → Represents **the entire input**.
|
||||
- `jq '.' file.json` → Pretty-prints the JSON.
|
||||
- `jq '.name'` → Extracts the `"name"` field.
|
||||
|
||||
2. **`[]` (Array/Iterator Operator)** → Unwraps arrays or objects.
|
||||
- `jq '.courses[]'` → Gets each course: `"Math"`, `"Science"`.
|
||||
- `jq '.address | .[]'` → Gets all values inside `address`: `"123 Main St"`, `"Boston"`.
|
||||
|
||||
3. **`|` (Pipe Operator)** → Chains operations (like Unix pipes).
|
||||
- `jq '.address | .city'` → Gets `"Boston"`.
|
||||
|
||||
4. **`select()`** → Filters data conditionally.
|
||||
- `jq '.users[] | select(.age > 30)'` → Only users over 30.
|
||||
|
||||
5. **`map()`** → Applies a function to each element.
|
||||
- `jq '.numbers | map(. * 2)'` → Doubles each number.
|
||||
|
||||
---
|
||||
|
||||
## **3. How `jq` Works Under the Hood**
|
||||
### **Step-by-Step Processing**
|
||||
1. **Input JSON is parsed** → Converted into an internal tree structure.
|
||||
2. **Your `jq` filter is compiled** → Turned into bytecode for efficiency.
|
||||
3. **The filter runs on the JSON tree** → Extracting/modifying data.
|
||||
4. **Results are output** → As JSON (or raw text with `-r`).
|
||||
|
||||
### **Key Takeaways**
|
||||
✅ `jq` **does not modify the original JSON**—it produces new output.
|
||||
✅ It processes data **lazily** (efficient for large files).
|
||||
✅ Uses **functional programming** concepts (like `map`, `select`).
|
||||
|
||||
---
|
||||
|
||||
## **4. Practical Examples to Solidify Understanding**
|
||||
|
||||
### **Example 1: Extracting a Simple Value**
|
||||
```bash
|
||||
echo '{"name": "Alice", "age": 30}' | jq '.name'
|
||||
# Output: "Alice"
|
||||
```
|
||||
|
||||
### **Example 2: Iterating Over an Array**
|
||||
```bash
|
||||
echo '{"users": ["Alice", "Bob", "Charlie"]}' | jq '.users[]'
|
||||
# Output:
|
||||
# "Alice"
|
||||
# "Bob"
|
||||
# "Charlie"
|
||||
```
|
||||
|
||||
### **Example 3: Filtering Data**
|
||||
```bash
|
||||
echo '{"people": [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 35}]}' | jq '.people[] | select(.age > 30)'
|
||||
# Output: {"name": "Bob", "age": 35}
|
||||
```
|
||||
|
||||
### **Example 4: Modifying Structure**
|
||||
```bash
|
||||
echo '{"name": "Alice", "age": 30}' | jq '{username: .name, years_old: .age}'
|
||||
# Output: {"username": "Alice", "years_old": 30}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **5. Summary: The Absolute Basics**
|
||||
1. **JSON** = Structured data format (objects `{}`, arrays `[]`, values).
|
||||
2. **`jq`** = A tool to **query & transform JSON** from the command line.
|
||||
3. **Key Operators**:
|
||||
- `.` → Current input
|
||||
- `[]` → Iterate over arrays/objects
|
||||
- `|` → Chain operations
|
||||
- `select()` → Filter data
|
||||
4. **`jq` is non-destructive** → Always produces new output.
|
||||
|
||||
---
|
||||
### **Next Steps**
|
||||
- Try running these examples in your terminal.
|
||||
- Experiment with real JSON files (e.g., API responses, configs).
|
||||
- Gradually move to more complex queries (`map`, `reduce`, custom functions).
|
||||
|
||||
Would you like me to clarify anything or provide more examples? 🚀
|
||||
|
||||
---
|
||||
|
||||
# Working with `jq` on Debian
|
||||
|
||||
## Introduction to `jq`
|
||||
|
||||
Reference in New Issue
Block a user