diff --git a/tech_docs/linux/jq.md b/tech_docs/linux/jq.md index b6de529..779f4d5 100644 --- a/tech_docs/linux/jq.md +++ b/tech_docs/linux/jq.md @@ -1,3 +1,227 @@ +# Understanding JSON and How `jq` Works Under the Hood + +## What is JSON? + +JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write, and easy for machines to parse and generate. It's built on two universal data structures: + +1. **Collections of name/value pairs** (called objects, dictionaries, or hashes in various languages) +2. **Ordered lists of values** (called arrays or lists) + +### JSON Syntax Basics: +```json +{ + "string": "value", + "number": 42, + "boolean": true, + "null": null, + "array": [1, 2, 3], + "object": { + "nested": "property" + } +} +``` + +## How `jq` Processes JSON + +### 1. Lexical Analysis (Tokenization) +When you run `jq`, it first breaks down the JSON input into tokens: +- Punctuation: `{ } [ ] , :` +- Strings (in quotes) +- Numbers +- Keywords: `true`, `false`, `null` + +### 2. Parsing +The tokens are then parsed into an Abstract Syntax Tree (AST) representing the JSON structure. This tree maintains: +- Object hierarchies +- Array orders +- Value types + +### 3. Processing Pipeline +`jq` works with a filter pipeline concept where: +- Input JSON is parsed into a stream of JSON values +- Each filter in your `jq` expression processes this stream +- The output of one filter becomes the input to the next + +### 4. Key Components Under the Hood: +- **Iterator Model**: `jq` processes JSON in a streaming fashion, handling one element at a time +- **Lazy Evaluation**: Expressions are only computed when needed +- **Pattern Matching**: The engine efficiently matches patterns against JSON structures +- **C Implementation**: Being written in C makes it fast at processing large JSON files + +## How `jq` Filters Work + +When you write a filter like `.users[].name`: + +1. `.` - Takes the entire input +2. `.users` - Selects the "users" property +3. `[]` - Iterates over the array elements +4. `.name` - Extracts the "name" property from each + +### Memory Management +`jq` is designed to: +- Handle large JSON documents efficiently +- Process streams of JSON objects without loading everything into memory +- Use tail-call optimization for recursive operations + +## Advanced Internal Concepts + +### 1. The jq Virtual Machine +`jq` actually compiles your filters to bytecode that runs on a custom virtual machine. This: +- Enables complex transformations +- Provides sandboxing for security +- Allows optimization of common operations + +### 2. Path Expressions +When you use path expressions like `.a.b.c`, `jq`: +- Navigates the JSON tree structure +- Handles missing properties gracefully +- Maintains context for relative paths + +### 3. Function Composition +Many `jq` operations are built from primitive functions that: +- Can be composed together +- Follow consistent patterns +- Maintain immutability (original JSON is never modified) + +## Example: What Happens When You Run `jq '.users[].name'` + +1. Input JSON is tokenized and parsed into memory as a tree structure +2. The root (`.`) is identified +3. The `users` property is located in the object +4. The array iterator (`[]`) starts processing each element +5. For each array element, the `name` property is extracted +6. Results are collected and output as a stream of values + +Would you like me to dive deeper into any specific aspect of JSON or `jq`'s internals? + +--- + +# **Understanding JSON & jq: The Core Fundamentals** + +Let’s break this down into **clear, foundational concepts** so you truly *get* how JSON works and how `jq` processes it. + +--- + +## **1. JSON Basics: The Building Blocks** +JSON (**J**ava**S**cript **O**bject **N**otation) is a structured way to represent data. Think of it like a nested combination of **dictionaries** (key-value pairs) and **lists** (ordered sequences). + +### **Key JSON Structures** +| Structure | Example | Description | +|------------|---------|-------------| +| **Object** (Dictionary) | `{"name": "Alice", "age": 30}` | Unordered `key:value` pairs (like a Python `dict` or JS object) | +| **Array** (List) | `[1, 2, 3, "hello"]` | Ordered list of values (like a Python `list` or JS array) | +| **String** | `"hello"` | Text in double quotes | +| **Number** | `42`, `3.14` | Integers or decimals | +| **Boolean** | `true`, `false` | Logical true/false | +| **Null** | `null` | Represents "no value" | + +### **Example JSON Document** +```json +{ + "name": "Alice", + "age": 30, + "is_student": false, + "courses": ["Math", "Science"], + "address": { + "street": "123 Main St", + "city": "Boston" + } +} +``` +- **Top-level object** (`{ ... }`) containing keys like `"name"`, `"age"`, etc. +- **Nested structures**: `"address"` is an object inside the main object. +- **Arrays**: `"courses"` holds a list of strings. + +--- + +## **2. How `jq` Processes JSON** +`jq` is a **filter** that takes JSON input, applies transformations, and produces JSON output. + +### **Core jq Concepts** +1. **`.` (Dot Operator)** → Represents **the entire input**. + - `jq '.' file.json` → Pretty-prints the JSON. + - `jq '.name'` → Extracts the `"name"` field. + +2. **`[]` (Array/Iterator Operator)** → Unwraps arrays or objects. + - `jq '.courses[]'` → Gets each course: `"Math"`, `"Science"`. + - `jq '.address | .[]'` → Gets all values inside `address`: `"123 Main St"`, `"Boston"`. + +3. **`|` (Pipe Operator)** → Chains operations (like Unix pipes). + - `jq '.address | .city'` → Gets `"Boston"`. + +4. **`select()`** → Filters data conditionally. + - `jq '.users[] | select(.age > 30)'` → Only users over 30. + +5. **`map()`** → Applies a function to each element. + - `jq '.numbers | map(. * 2)'` → Doubles each number. + +--- + +## **3. How `jq` Works Under the Hood** +### **Step-by-Step Processing** +1. **Input JSON is parsed** → Converted into an internal tree structure. +2. **Your `jq` filter is compiled** → Turned into bytecode for efficiency. +3. **The filter runs on the JSON tree** → Extracting/modifying data. +4. **Results are output** → As JSON (or raw text with `-r`). + +### **Key Takeaways** +✅ `jq` **does not modify the original JSON**—it produces new output. +✅ It processes data **lazily** (efficient for large files). +✅ Uses **functional programming** concepts (like `map`, `select`). + +--- + +## **4. Practical Examples to Solidify Understanding** + +### **Example 1: Extracting a Simple Value** +```bash +echo '{"name": "Alice", "age": 30}' | jq '.name' +# Output: "Alice" +``` + +### **Example 2: Iterating Over an Array** +```bash +echo '{"users": ["Alice", "Bob", "Charlie"]}' | jq '.users[]' +# Output: +# "Alice" +# "Bob" +# "Charlie" +``` + +### **Example 3: Filtering Data** +```bash +echo '{"people": [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 35}]}' | jq '.people[] | select(.age > 30)' +# Output: {"name": "Bob", "age": 35} +``` + +### **Example 4: Modifying Structure** +```bash +echo '{"name": "Alice", "age": 30}' | jq '{username: .name, years_old: .age}' +# Output: {"username": "Alice", "years_old": 30} +``` + +--- + +## **5. Summary: The Absolute Basics** +1. **JSON** = Structured data format (objects `{}`, arrays `[]`, values). +2. **`jq`** = A tool to **query & transform JSON** from the command line. +3. **Key Operators**: + - `.` → Current input + - `[]` → Iterate over arrays/objects + - `|` → Chain operations + - `select()` → Filter data +4. **`jq` is non-destructive** → Always produces new output. + +--- +### **Next Steps** +- Try running these examples in your terminal. +- Experiment with real JSON files (e.g., API responses, configs). +- Gradually move to more complex queries (`map`, `reduce`, custom functions). + +Would you like me to clarify anything or provide more examples? 🚀 + +--- + # Working with `jq` on Debian ## Introduction to `jq`