Update tech_docs/linux/jq.md

This commit is contained in:
2025-06-25 00:57:57 +00:00
parent 13bd115958
commit f7096fa432

View File

@@ -1,3 +1,227 @@
# Understanding JSON and How `jq` Works Under the Hood
## What is JSON?
JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write, and easy for machines to parse and generate. It's built on two universal data structures:
1. **Collections of name/value pairs** (called objects, dictionaries, or hashes in various languages)
2. **Ordered lists of values** (called arrays or lists)
### JSON Syntax Basics:
```json
{
"string": "value",
"number": 42,
"boolean": true,
"null": null,
"array": [1, 2, 3],
"object": {
"nested": "property"
}
}
```
## How `jq` Processes JSON
### 1. Lexical Analysis (Tokenization)
When you run `jq`, it first breaks down the JSON input into tokens:
- Punctuation: `{ } [ ] , :`
- Strings (in quotes)
- Numbers
- Keywords: `true`, `false`, `null`
### 2. Parsing
The tokens are then parsed into an Abstract Syntax Tree (AST) representing the JSON structure. This tree maintains:
- Object hierarchies
- Array orders
- Value types
### 3. Processing Pipeline
`jq` works with a filter pipeline concept where:
- Input JSON is parsed into a stream of JSON values
- Each filter in your `jq` expression processes this stream
- The output of one filter becomes the input to the next
### 4. Key Components Under the Hood:
- **Iterator Model**: `jq` processes JSON in a streaming fashion, handling one element at a time
- **Lazy Evaluation**: Expressions are only computed when needed
- **Pattern Matching**: The engine efficiently matches patterns against JSON structures
- **C Implementation**: Being written in C makes it fast at processing large JSON files
## How `jq` Filters Work
When you write a filter like `.users[].name`:
1. `.` - Takes the entire input
2. `.users` - Selects the "users" property
3. `[]` - Iterates over the array elements
4. `.name` - Extracts the "name" property from each
### Memory Management
`jq` is designed to:
- Handle large JSON documents efficiently
- Process streams of JSON objects without loading everything into memory
- Use tail-call optimization for recursive operations
## Advanced Internal Concepts
### 1. The jq Virtual Machine
`jq` actually compiles your filters to bytecode that runs on a custom virtual machine. This:
- Enables complex transformations
- Provides sandboxing for security
- Allows optimization of common operations
### 2. Path Expressions
When you use path expressions like `.a.b.c`, `jq`:
- Navigates the JSON tree structure
- Handles missing properties gracefully
- Maintains context for relative paths
### 3. Function Composition
Many `jq` operations are built from primitive functions that:
- Can be composed together
- Follow consistent patterns
- Maintain immutability (original JSON is never modified)
## Example: What Happens When You Run `jq '.users[].name'`
1. Input JSON is tokenized and parsed into memory as a tree structure
2. The root (`.`) is identified
3. The `users` property is located in the object
4. The array iterator (`[]`) starts processing each element
5. For each array element, the `name` property is extracted
6. Results are collected and output as a stream of values
Would you like me to dive deeper into any specific aspect of JSON or `jq`'s internals?
---
# **Understanding JSON & jq: The Core Fundamentals**
Lets break this down into **clear, foundational concepts** so you truly *get* how JSON works and how `jq` processes it.
---
## **1. JSON Basics: The Building Blocks**
JSON (**J**ava**S**cript **O**bject **N**otation) is a structured way to represent data. Think of it like a nested combination of **dictionaries** (key-value pairs) and **lists** (ordered sequences).
### **Key JSON Structures**
| Structure | Example | Description |
|------------|---------|-------------|
| **Object** (Dictionary) | `{"name": "Alice", "age": 30}` | Unordered `key:value` pairs (like a Python `dict` or JS object) |
| **Array** (List) | `[1, 2, 3, "hello"]` | Ordered list of values (like a Python `list` or JS array) |
| **String** | `"hello"` | Text in double quotes |
| **Number** | `42`, `3.14` | Integers or decimals |
| **Boolean** | `true`, `false` | Logical true/false |
| **Null** | `null` | Represents "no value" |
### **Example JSON Document**
```json
{
"name": "Alice",
"age": 30,
"is_student": false,
"courses": ["Math", "Science"],
"address": {
"street": "123 Main St",
"city": "Boston"
}
}
```
- **Top-level object** (`{ ... }`) containing keys like `"name"`, `"age"`, etc.
- **Nested structures**: `"address"` is an object inside the main object.
- **Arrays**: `"courses"` holds a list of strings.
---
## **2. How `jq` Processes JSON**
`jq` is a **filter** that takes JSON input, applies transformations, and produces JSON output.
### **Core jq Concepts**
1. **`.` (Dot Operator)** → Represents **the entire input**.
- `jq '.' file.json` → Pretty-prints the JSON.
- `jq '.name'` → Extracts the `"name"` field.
2. **`[]` (Array/Iterator Operator)** → Unwraps arrays or objects.
- `jq '.courses[]'` → Gets each course: `"Math"`, `"Science"`.
- `jq '.address | .[]'` → Gets all values inside `address`: `"123 Main St"`, `"Boston"`.
3. **`|` (Pipe Operator)** → Chains operations (like Unix pipes).
- `jq '.address | .city'` → Gets `"Boston"`.
4. **`select()`** → Filters data conditionally.
- `jq '.users[] | select(.age > 30)'` → Only users over 30.
5. **`map()`** → Applies a function to each element.
- `jq '.numbers | map(. * 2)'` → Doubles each number.
---
## **3. How `jq` Works Under the Hood**
### **Step-by-Step Processing**
1. **Input JSON is parsed** → Converted into an internal tree structure.
2. **Your `jq` filter is compiled** → Turned into bytecode for efficiency.
3. **The filter runs on the JSON tree** → Extracting/modifying data.
4. **Results are output** → As JSON (or raw text with `-r`).
### **Key Takeaways**
`jq` **does not modify the original JSON**—it produces new output.
✅ It processes data **lazily** (efficient for large files).
✅ Uses **functional programming** concepts (like `map`, `select`).
---
## **4. Practical Examples to Solidify Understanding**
### **Example 1: Extracting a Simple Value**
```bash
echo '{"name": "Alice", "age": 30}' | jq '.name'
# Output: "Alice"
```
### **Example 2: Iterating Over an Array**
```bash
echo '{"users": ["Alice", "Bob", "Charlie"]}' | jq '.users[]'
# Output:
# "Alice"
# "Bob"
# "Charlie"
```
### **Example 3: Filtering Data**
```bash
echo '{"people": [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 35}]}' | jq '.people[] | select(.age > 30)'
# Output: {"name": "Bob", "age": 35}
```
### **Example 4: Modifying Structure**
```bash
echo '{"name": "Alice", "age": 30}' | jq '{username: .name, years_old: .age}'
# Output: {"username": "Alice", "years_old": 30}
```
---
## **5. Summary: The Absolute Basics**
1. **JSON** = Structured data format (objects `{}`, arrays `[]`, values).
2. **`jq`** = A tool to **query & transform JSON** from the command line.
3. **Key Operators**:
- `.` → Current input
- `[]` → Iterate over arrays/objects
- `|` → Chain operations
- `select()` → Filter data
4. **`jq` is non-destructive** → Always produces new output.
---
### **Next Steps**
- Try running these examples in your terminal.
- Experiment with real JSON files (e.g., API responses, configs).
- Gradually move to more complex queries (`map`, `reduce`, custom functions).
Would you like me to clarify anything or provide more examples? 🚀
---
# Working with `jq` on Debian # Working with `jq` on Debian
## Introduction to `jq` ## Introduction to `jq`