Files
2025-06-25 00:57:57 +00:00

580 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Understanding JSON and How `jq` Works Under the Hood
## What is JSON?
JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write, and easy for machines to parse and generate. It's built on two universal data structures:
1. **Collections of name/value pairs** (called objects, dictionaries, or hashes in various languages)
2. **Ordered lists of values** (called arrays or lists)
### JSON Syntax Basics:
```json
{
"string": "value",
"number": 42,
"boolean": true,
"null": null,
"array": [1, 2, 3],
"object": {
"nested": "property"
}
}
```
## How `jq` Processes JSON
### 1. Lexical Analysis (Tokenization)
When you run `jq`, it first breaks down the JSON input into tokens:
- Punctuation: `{ } [ ] , :`
- Strings (in quotes)
- Numbers
- Keywords: `true`, `false`, `null`
### 2. Parsing
The tokens are then parsed into an Abstract Syntax Tree (AST) representing the JSON structure. This tree maintains:
- Object hierarchies
- Array orders
- Value types
### 3. Processing Pipeline
`jq` works with a filter pipeline concept where:
- Input JSON is parsed into a stream of JSON values
- Each filter in your `jq` expression processes this stream
- The output of one filter becomes the input to the next
### 4. Key Components Under the Hood:
- **Iterator Model**: `jq` processes JSON in a streaming fashion, handling one element at a time
- **Lazy Evaluation**: Expressions are only computed when needed
- **Pattern Matching**: The engine efficiently matches patterns against JSON structures
- **C Implementation**: Being written in C makes it fast at processing large JSON files
## How `jq` Filters Work
When you write a filter like `.users[].name`:
1. `.` - Takes the entire input
2. `.users` - Selects the "users" property
3. `[]` - Iterates over the array elements
4. `.name` - Extracts the "name" property from each
### Memory Management
`jq` is designed to:
- Handle large JSON documents efficiently
- Process streams of JSON objects without loading everything into memory
- Use tail-call optimization for recursive operations
## Advanced Internal Concepts
### 1. The jq Virtual Machine
`jq` actually compiles your filters to bytecode that runs on a custom virtual machine. This:
- Enables complex transformations
- Provides sandboxing for security
- Allows optimization of common operations
### 2. Path Expressions
When you use path expressions like `.a.b.c`, `jq`:
- Navigates the JSON tree structure
- Handles missing properties gracefully
- Maintains context for relative paths
### 3. Function Composition
Many `jq` operations are built from primitive functions that:
- Can be composed together
- Follow consistent patterns
- Maintain immutability (original JSON is never modified)
## Example: What Happens When You Run `jq '.users[].name'`
1. Input JSON is tokenized and parsed into memory as a tree structure
2. The root (`.`) is identified
3. The `users` property is located in the object
4. The array iterator (`[]`) starts processing each element
5. For each array element, the `name` property is extracted
6. Results are collected and output as a stream of values
Would you like me to dive deeper into any specific aspect of JSON or `jq`'s internals?
---
# **Understanding JSON & jq: The Core Fundamentals**
Lets break this down into **clear, foundational concepts** so you truly *get* how JSON works and how `jq` processes it.
---
## **1. JSON Basics: The Building Blocks**
JSON (**J**ava**S**cript **O**bject **N**otation) is a structured way to represent data. Think of it like a nested combination of **dictionaries** (key-value pairs) and **lists** (ordered sequences).
### **Key JSON Structures**
| Structure | Example | Description |
|------------|---------|-------------|
| **Object** (Dictionary) | `{"name": "Alice", "age": 30}` | Unordered `key:value` pairs (like a Python `dict` or JS object) |
| **Array** (List) | `[1, 2, 3, "hello"]` | Ordered list of values (like a Python `list` or JS array) |
| **String** | `"hello"` | Text in double quotes |
| **Number** | `42`, `3.14` | Integers or decimals |
| **Boolean** | `true`, `false` | Logical true/false |
| **Null** | `null` | Represents "no value" |
### **Example JSON Document**
```json
{
"name": "Alice",
"age": 30,
"is_student": false,
"courses": ["Math", "Science"],
"address": {
"street": "123 Main St",
"city": "Boston"
}
}
```
- **Top-level object** (`{ ... }`) containing keys like `"name"`, `"age"`, etc.
- **Nested structures**: `"address"` is an object inside the main object.
- **Arrays**: `"courses"` holds a list of strings.
---
## **2. How `jq` Processes JSON**
`jq` is a **filter** that takes JSON input, applies transformations, and produces JSON output.
### **Core jq Concepts**
1. **`.` (Dot Operator)** → Represents **the entire input**.
- `jq '.' file.json` → Pretty-prints the JSON.
- `jq '.name'` → Extracts the `"name"` field.
2. **`[]` (Array/Iterator Operator)** → Unwraps arrays or objects.
- `jq '.courses[]'` → Gets each course: `"Math"`, `"Science"`.
- `jq '.address | .[]'` → Gets all values inside `address`: `"123 Main St"`, `"Boston"`.
3. **`|` (Pipe Operator)** → Chains operations (like Unix pipes).
- `jq '.address | .city'` → Gets `"Boston"`.
4. **`select()`** → Filters data conditionally.
- `jq '.users[] | select(.age > 30)'` → Only users over 30.
5. **`map()`** → Applies a function to each element.
- `jq '.numbers | map(. * 2)'` → Doubles each number.
---
## **3. How `jq` Works Under the Hood**
### **Step-by-Step Processing**
1. **Input JSON is parsed** → Converted into an internal tree structure.
2. **Your `jq` filter is compiled** → Turned into bytecode for efficiency.
3. **The filter runs on the JSON tree** → Extracting/modifying data.
4. **Results are output** → As JSON (or raw text with `-r`).
### **Key Takeaways**
`jq` **does not modify the original JSON**—it produces new output.
✅ It processes data **lazily** (efficient for large files).
✅ Uses **functional programming** concepts (like `map`, `select`).
---
## **4. Practical Examples to Solidify Understanding**
### **Example 1: Extracting a Simple Value**
```bash
echo '{"name": "Alice", "age": 30}' | jq '.name'
# Output: "Alice"
```
### **Example 2: Iterating Over an Array**
```bash
echo '{"users": ["Alice", "Bob", "Charlie"]}' | jq '.users[]'
# Output:
# "Alice"
# "Bob"
# "Charlie"
```
### **Example 3: Filtering Data**
```bash
echo '{"people": [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 35}]}' | jq '.people[] | select(.age > 30)'
# Output: {"name": "Bob", "age": 35}
```
### **Example 4: Modifying Structure**
```bash
echo '{"name": "Alice", "age": 30}' | jq '{username: .name, years_old: .age}'
# Output: {"username": "Alice", "years_old": 30}
```
---
## **5. Summary: The Absolute Basics**
1. **JSON** = Structured data format (objects `{}`, arrays `[]`, values).
2. **`jq`** = A tool to **query & transform JSON** from the command line.
3. **Key Operators**:
- `.` → Current input
- `[]` → Iterate over arrays/objects
- `|` → Chain operations
- `select()` → Filter data
4. **`jq` is non-destructive** → Always produces new output.
---
### **Next Steps**
- Try running these examples in your terminal.
- Experiment with real JSON files (e.g., API responses, configs).
- Gradually move to more complex queries (`map`, `reduce`, custom functions).
Would you like me to clarify anything or provide more examples? 🚀
---
# Working with `jq` on Debian
## Introduction to `jq`
`jq` is a powerful command-line tool used for parsing, filtering, transforming, and analyzing JSON data. It allows you to manipulate JSON in a similar way to how `sed`, `awk`, and `grep` handle text files. This guide will walk you through installing `jq`, basic usage, practical examples, and common use cases.
## Installation
To install `jq` on a Debian-based system, use the following commands:
```sh
sudo apt-get update
sudo apt-get install jq
```
## JSON Examples for Practice
Here are some sample JSON data structures to practice with:
### Example 1: Simple JSON Object
```json
{
"name": "John Doe",
"age": 30,
"city": "New York"
}
```
### Example 2: JSON Array
```json
[
{
"name": "John Doe",
"age": 30,
"city": "New York"
},
{
"name": "Jane Smith",
"age": 25,
"city": "Los Angeles"
},
{
"name": "Sam Brown",
"age": 20,
"city": "Chicago"
}
]
```
### Example 3: Nested JSON Object
```json
{
"id": 1,
"name": "Product Name",
"price": 29.99,
"tags": ["electronics", "gadget"],
"stock": {
"warehouse": 100,
"retail": 50
}
}
```
## Basic `jq` Commands
### Parsing and Pretty-Printing JSON
To pretty-print JSON, you can use the `.` filter:
```sh
cat example1.json | jq .
```
### Extracting a Value
To extract a specific value from a JSON object:
```sh
cat example1.json | jq '.name'
```
For a JSON array, you can extract a specific element by index:
```sh
cat example2.json | jq '.[0].name'
```
### Filtering JSON Arrays
To filter an array based on a condition:
```sh
cat example2.json | jq '.[] | select(.age > 25)'
```
### Modifying JSON
To modify a JSON object and add a new field:
```sh
cat example1.json | jq '. + {"country": "USA"}'
```
### Combining Filters
You can combine multiple filters to achieve more complex queries:
```sh
cat example3.json | jq '.stock | {total_stock: (.warehouse + .retail)}'
```
## Practical Exercises
### Exercise 1: Extract the Age of "Jane Smith"
```sh
cat example2.json | jq '.[] | select(.name == "Jane Smith") | .age'
```
### Exercise 2: List All Names
```sh
cat example2.json | jq '.[].name'
```
### Exercise 3: Calculate Total Stock
```sh
cat example3.json | jq '.stock | .warehouse + .retail'
```
### Exercise 4: Add a New Tag "sale" to the Tags Array
```sh
cat example3.json | jq '.tags += ["sale"]'
```
## Common Uses of `jq`
### Parsing API Responses
When interacting with web APIs, the responses are often in JSON format. `jq` allows you to parse, filter, and extract the necessary data from these responses.
```sh
curl -s https://api.example.com/data | jq '.items[] | {name: .name, id: .id}'
```
### Processing Configuration Files
Many modern applications use JSON for configuration. With `jq`, you can easily modify or extract values from these files.
```sh
jq '.settings.debug = true' config.json > new_config.json
```
### Log Analysis
If your logs are in JSON format, you can use `jq` to search for specific entries, aggregate data, or transform the logs into a more readable format.
```sh
cat logs.json | jq '.[] | select(.level == "error") | {timestamp: .timestamp, message: .message}'
```
### Data Transformation
Transforming JSON data into different structures or formats is straightforward with `jq`. This is useful for data pipelines or ETL (Extract, Transform, Load) processes.
```sh
cat data.json | jq '[.items[] | {name: .name, value: .value}]'
```
### Scripting and Automation
In shell scripts, `jq` can be used to parse and manipulate JSON data as part of automation tasks.
```sh
# Extracting a value from JSON in a script
response=$(curl -s https://api.example.com/data)
id=$(echo $response | jq -r '.items[0].id')
echo "The ID is: $id"
```
### Testing and Debugging
When developing applications that produce or consume JSON, `jq` helps in quickly inspecting the JSON output for correctness.
```sh
cat response.json | jq '.'
```
## Practical Scenarios
### Working with Kubernetes
Kubernetes uses JSON and YAML extensively. You can use `jq` to filter and extract information from the JSON output of `kubectl` commands.
```sh
kubectl get pods -o json | jq '.items[] | {name: .metadata.name, status: .status.phase}'
```
### CI/CD Pipelines
In continuous integration and deployment workflows, `jq` can parse and transform JSON data used in configuration files, reports, or environment variables.
```sh
echo $GITHUB_EVENT_PATH | jq '.commits[] | {message: .message, author: .author.name}'
```
### Web Development
When dealing with front-end and back-end integration, `jq` helps in simulating API responses or transforming data formats.
```sh
cat mock_response.json | jq '.users[] | {username: .login, email: .email}'
```
### Data Analysis
For quick analysis of JSON data files, `jq` provides a powerful way to query and aggregate data.
```sh
cat data.json | jq '[.records[] | select(.active == true) | .value] | add'
```
### DevOps and Infrastructure Management
Tools like Terraform and AWS CLI produce JSON output, and `jq` is perfect for extracting and processing this information.
```sh
aws ec2 describe-instances | jq '.Reservations[].Instances[] | {instanceId: .InstanceId, state: .State.Name}'
```
## Conclusion
`jq` is a versatile tool that can be integrated into various workflows to handle JSON data efficiently. Whether you're working with APIs, configuration files, logs, or automation scripts, `jq` helps you parse, filter, and transform JSON data with ease.
Feel free to modify these examples and try different commands. `jq` has a comprehensive manual that you can refer to for more advanced features:
```sh
man jq
```
Happy learning! If you have any specific questions or need further assistance with `jq`, let me know!
---
# Learning `jq` for Command-Line JSON Processing
`jq` is a powerful command-line JSON processor that lets you parse, filter, and transform JSON data. Here's a comprehensive guide to get you started:
## Installation
Most Linux distributions and macOS can install it via package managers:
```bash
# Ubuntu/Debian
sudo apt install jq
# CentOS/RHEL
sudo yum install jq
# macOS (using Homebrew)
brew install jq
# Windows (via Chocolatey)
choco install jq
```
## Basic Usage
```bash
# Basic pretty-printing
jq '.' file.json
# Read from stdin
curl -s https://api.example.com/data | jq '.'
```
## Selecting Data
```bash
# Get a specific field
jq '.field' file.json
# Get nested fields
jq '.parent.child.grandchild' file.json
# Get array elements
jq '.array[0]' file.json # First element
jq '.array[-1]' file.json # Last element
jq '.array[2:5]' file.json # Slice from index 2 to 5
```
## Common Operations
```bash
# Get multiple fields
jq '{name: .name, age: .age}' file.json
# Filter arrays
jq '.users[] | select(.age > 30)' file.json
# Map operations
jq '.numbers[] | . * 2' file.json
# Sort
jq '.users | sort_by(.age)' file.json
# Length/count
jq '.array | length' file.json
```
## Advanced Features
```bash
# String interpolation
jq '"Hello, \(.name)!"' file.json
# Conditional logic
jq 'if .age > 18 then "Adult" else "Minor" end' file.json
# Variables
jq '. as $item | $item.name' file.json
# Custom functions
jq 'def add(x; y): x + y; add(5; 10)' <<< '{}'
```
## Practical Examples
```bash
# Extract all email addresses from JSON
jq -r '.users[].email' file.json
# Convert JSON to CSV
jq -r '.users[] | [.name, .email, .age] | @csv' file.json
# Sum all values in an array
jq '[.numbers[]] | add' file.json
# Find unique values
jq '.tags[]' file.json | sort | uniq
# Modify JSON structure
jq '{user: .name, contact: {email: .email, phone: .tel}}' file.json
```
## Tips
1. Use `-r` for raw output (no quotes around strings)
2. Combine with `curl` for API responses: `curl -s ... | jq ...`
3. Use `//` for default values: `jq '.name // "Anonymous"'`
4. For large files, use `--stream` for iterative parsing