580 lines
15 KiB
Markdown
580 lines
15 KiB
Markdown
# Understanding JSON and How `jq` Works Under the Hood
|
||
|
||
## What is JSON?
|
||
|
||
JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write, and easy for machines to parse and generate. It's built on two universal data structures:
|
||
|
||
1. **Collections of name/value pairs** (called objects, dictionaries, or hashes in various languages)
|
||
2. **Ordered lists of values** (called arrays or lists)
|
||
|
||
### JSON Syntax Basics:
|
||
```json
|
||
{
|
||
"string": "value",
|
||
"number": 42,
|
||
"boolean": true,
|
||
"null": null,
|
||
"array": [1, 2, 3],
|
||
"object": {
|
||
"nested": "property"
|
||
}
|
||
}
|
||
```
|
||
|
||
## How `jq` Processes JSON
|
||
|
||
### 1. Lexical Analysis (Tokenization)
|
||
When you run `jq`, it first breaks down the JSON input into tokens:
|
||
- Punctuation: `{ } [ ] , :`
|
||
- Strings (in quotes)
|
||
- Numbers
|
||
- Keywords: `true`, `false`, `null`
|
||
|
||
### 2. Parsing
|
||
The tokens are then parsed into an Abstract Syntax Tree (AST) representing the JSON structure. This tree maintains:
|
||
- Object hierarchies
|
||
- Array orders
|
||
- Value types
|
||
|
||
### 3. Processing Pipeline
|
||
`jq` works with a filter pipeline concept where:
|
||
- Input JSON is parsed into a stream of JSON values
|
||
- Each filter in your `jq` expression processes this stream
|
||
- The output of one filter becomes the input to the next
|
||
|
||
### 4. Key Components Under the Hood:
|
||
- **Iterator Model**: `jq` processes JSON in a streaming fashion, handling one element at a time
|
||
- **Lazy Evaluation**: Expressions are only computed when needed
|
||
- **Pattern Matching**: The engine efficiently matches patterns against JSON structures
|
||
- **C Implementation**: Being written in C makes it fast at processing large JSON files
|
||
|
||
## How `jq` Filters Work
|
||
|
||
When you write a filter like `.users[].name`:
|
||
|
||
1. `.` - Takes the entire input
|
||
2. `.users` - Selects the "users" property
|
||
3. `[]` - Iterates over the array elements
|
||
4. `.name` - Extracts the "name" property from each
|
||
|
||
### Memory Management
|
||
`jq` is designed to:
|
||
- Handle large JSON documents efficiently
|
||
- Process streams of JSON objects without loading everything into memory
|
||
- Use tail-call optimization for recursive operations
|
||
|
||
## Advanced Internal Concepts
|
||
|
||
### 1. The jq Virtual Machine
|
||
`jq` actually compiles your filters to bytecode that runs on a custom virtual machine. This:
|
||
- Enables complex transformations
|
||
- Provides sandboxing for security
|
||
- Allows optimization of common operations
|
||
|
||
### 2. Path Expressions
|
||
When you use path expressions like `.a.b.c`, `jq`:
|
||
- Navigates the JSON tree structure
|
||
- Handles missing properties gracefully
|
||
- Maintains context for relative paths
|
||
|
||
### 3. Function Composition
|
||
Many `jq` operations are built from primitive functions that:
|
||
- Can be composed together
|
||
- Follow consistent patterns
|
||
- Maintain immutability (original JSON is never modified)
|
||
|
||
## Example: What Happens When You Run `jq '.users[].name'`
|
||
|
||
1. Input JSON is tokenized and parsed into memory as a tree structure
|
||
2. The root (`.`) is identified
|
||
3. The `users` property is located in the object
|
||
4. The array iterator (`[]`) starts processing each element
|
||
5. For each array element, the `name` property is extracted
|
||
6. Results are collected and output as a stream of values
|
||
|
||
Would you like me to dive deeper into any specific aspect of JSON or `jq`'s internals?
|
||
|
||
---
|
||
|
||
# **Understanding JSON & jq: The Core Fundamentals**
|
||
|
||
Let’s break this down into **clear, foundational concepts** so you truly *get* how JSON works and how `jq` processes it.
|
||
|
||
---
|
||
|
||
## **1. JSON Basics: The Building Blocks**
|
||
JSON (**J**ava**S**cript **O**bject **N**otation) is a structured way to represent data. Think of it like a nested combination of **dictionaries** (key-value pairs) and **lists** (ordered sequences).
|
||
|
||
### **Key JSON Structures**
|
||
| Structure | Example | Description |
|
||
|------------|---------|-------------|
|
||
| **Object** (Dictionary) | `{"name": "Alice", "age": 30}` | Unordered `key:value` pairs (like a Python `dict` or JS object) |
|
||
| **Array** (List) | `[1, 2, 3, "hello"]` | Ordered list of values (like a Python `list` or JS array) |
|
||
| **String** | `"hello"` | Text in double quotes |
|
||
| **Number** | `42`, `3.14` | Integers or decimals |
|
||
| **Boolean** | `true`, `false` | Logical true/false |
|
||
| **Null** | `null` | Represents "no value" |
|
||
|
||
### **Example JSON Document**
|
||
```json
|
||
{
|
||
"name": "Alice",
|
||
"age": 30,
|
||
"is_student": false,
|
||
"courses": ["Math", "Science"],
|
||
"address": {
|
||
"street": "123 Main St",
|
||
"city": "Boston"
|
||
}
|
||
}
|
||
```
|
||
- **Top-level object** (`{ ... }`) containing keys like `"name"`, `"age"`, etc.
|
||
- **Nested structures**: `"address"` is an object inside the main object.
|
||
- **Arrays**: `"courses"` holds a list of strings.
|
||
|
||
---
|
||
|
||
## **2. How `jq` Processes JSON**
|
||
`jq` is a **filter** that takes JSON input, applies transformations, and produces JSON output.
|
||
|
||
### **Core jq Concepts**
|
||
1. **`.` (Dot Operator)** → Represents **the entire input**.
|
||
- `jq '.' file.json` → Pretty-prints the JSON.
|
||
- `jq '.name'` → Extracts the `"name"` field.
|
||
|
||
2. **`[]` (Array/Iterator Operator)** → Unwraps arrays or objects.
|
||
- `jq '.courses[]'` → Gets each course: `"Math"`, `"Science"`.
|
||
- `jq '.address | .[]'` → Gets all values inside `address`: `"123 Main St"`, `"Boston"`.
|
||
|
||
3. **`|` (Pipe Operator)** → Chains operations (like Unix pipes).
|
||
- `jq '.address | .city'` → Gets `"Boston"`.
|
||
|
||
4. **`select()`** → Filters data conditionally.
|
||
- `jq '.users[] | select(.age > 30)'` → Only users over 30.
|
||
|
||
5. **`map()`** → Applies a function to each element.
|
||
- `jq '.numbers | map(. * 2)'` → Doubles each number.
|
||
|
||
---
|
||
|
||
## **3. How `jq` Works Under the Hood**
|
||
### **Step-by-Step Processing**
|
||
1. **Input JSON is parsed** → Converted into an internal tree structure.
|
||
2. **Your `jq` filter is compiled** → Turned into bytecode for efficiency.
|
||
3. **The filter runs on the JSON tree** → Extracting/modifying data.
|
||
4. **Results are output** → As JSON (or raw text with `-r`).
|
||
|
||
### **Key Takeaways**
|
||
✅ `jq` **does not modify the original JSON**—it produces new output.
|
||
✅ It processes data **lazily** (efficient for large files).
|
||
✅ Uses **functional programming** concepts (like `map`, `select`).
|
||
|
||
---
|
||
|
||
## **4. Practical Examples to Solidify Understanding**
|
||
|
||
### **Example 1: Extracting a Simple Value**
|
||
```bash
|
||
echo '{"name": "Alice", "age": 30}' | jq '.name'
|
||
# Output: "Alice"
|
||
```
|
||
|
||
### **Example 2: Iterating Over an Array**
|
||
```bash
|
||
echo '{"users": ["Alice", "Bob", "Charlie"]}' | jq '.users[]'
|
||
# Output:
|
||
# "Alice"
|
||
# "Bob"
|
||
# "Charlie"
|
||
```
|
||
|
||
### **Example 3: Filtering Data**
|
||
```bash
|
||
echo '{"people": [{"name": "Alice", "age": 25}, {"name": "Bob", "age": 35}]}' | jq '.people[] | select(.age > 30)'
|
||
# Output: {"name": "Bob", "age": 35}
|
||
```
|
||
|
||
### **Example 4: Modifying Structure**
|
||
```bash
|
||
echo '{"name": "Alice", "age": 30}' | jq '{username: .name, years_old: .age}'
|
||
# Output: {"username": "Alice", "years_old": 30}
|
||
```
|
||
|
||
---
|
||
|
||
## **5. Summary: The Absolute Basics**
|
||
1. **JSON** = Structured data format (objects `{}`, arrays `[]`, values).
|
||
2. **`jq`** = A tool to **query & transform JSON** from the command line.
|
||
3. **Key Operators**:
|
||
- `.` → Current input
|
||
- `[]` → Iterate over arrays/objects
|
||
- `|` → Chain operations
|
||
- `select()` → Filter data
|
||
4. **`jq` is non-destructive** → Always produces new output.
|
||
|
||
---
|
||
### **Next Steps**
|
||
- Try running these examples in your terminal.
|
||
- Experiment with real JSON files (e.g., API responses, configs).
|
||
- Gradually move to more complex queries (`map`, `reduce`, custom functions).
|
||
|
||
Would you like me to clarify anything or provide more examples? 🚀
|
||
|
||
---
|
||
|
||
# Working with `jq` on Debian
|
||
|
||
## Introduction to `jq`
|
||
|
||
`jq` is a powerful command-line tool used for parsing, filtering, transforming, and analyzing JSON data. It allows you to manipulate JSON in a similar way to how `sed`, `awk`, and `grep` handle text files. This guide will walk you through installing `jq`, basic usage, practical examples, and common use cases.
|
||
|
||
## Installation
|
||
|
||
To install `jq` on a Debian-based system, use the following commands:
|
||
|
||
```sh
|
||
sudo apt-get update
|
||
sudo apt-get install jq
|
||
```
|
||
|
||
## JSON Examples for Practice
|
||
|
||
Here are some sample JSON data structures to practice with:
|
||
|
||
### Example 1: Simple JSON Object
|
||
|
||
```json
|
||
{
|
||
"name": "John Doe",
|
||
"age": 30,
|
||
"city": "New York"
|
||
}
|
||
```
|
||
|
||
### Example 2: JSON Array
|
||
|
||
```json
|
||
[
|
||
{
|
||
"name": "John Doe",
|
||
"age": 30,
|
||
"city": "New York"
|
||
},
|
||
{
|
||
"name": "Jane Smith",
|
||
"age": 25,
|
||
"city": "Los Angeles"
|
||
},
|
||
{
|
||
"name": "Sam Brown",
|
||
"age": 20,
|
||
"city": "Chicago"
|
||
}
|
||
]
|
||
```
|
||
|
||
### Example 3: Nested JSON Object
|
||
|
||
```json
|
||
{
|
||
"id": 1,
|
||
"name": "Product Name",
|
||
"price": 29.99,
|
||
"tags": ["electronics", "gadget"],
|
||
"stock": {
|
||
"warehouse": 100,
|
||
"retail": 50
|
||
}
|
||
}
|
||
```
|
||
|
||
## Basic `jq` Commands
|
||
|
||
### Parsing and Pretty-Printing JSON
|
||
|
||
To pretty-print JSON, you can use the `.` filter:
|
||
|
||
```sh
|
||
cat example1.json | jq .
|
||
```
|
||
|
||
### Extracting a Value
|
||
|
||
To extract a specific value from a JSON object:
|
||
|
||
```sh
|
||
cat example1.json | jq '.name'
|
||
```
|
||
|
||
For a JSON array, you can extract a specific element by index:
|
||
|
||
```sh
|
||
cat example2.json | jq '.[0].name'
|
||
```
|
||
|
||
### Filtering JSON Arrays
|
||
|
||
To filter an array based on a condition:
|
||
|
||
```sh
|
||
cat example2.json | jq '.[] | select(.age > 25)'
|
||
```
|
||
|
||
### Modifying JSON
|
||
|
||
To modify a JSON object and add a new field:
|
||
|
||
```sh
|
||
cat example1.json | jq '. + {"country": "USA"}'
|
||
```
|
||
|
||
### Combining Filters
|
||
|
||
You can combine multiple filters to achieve more complex queries:
|
||
|
||
```sh
|
||
cat example3.json | jq '.stock | {total_stock: (.warehouse + .retail)}'
|
||
```
|
||
|
||
## Practical Exercises
|
||
|
||
### Exercise 1: Extract the Age of "Jane Smith"
|
||
|
||
```sh
|
||
cat example2.json | jq '.[] | select(.name == "Jane Smith") | .age'
|
||
```
|
||
|
||
### Exercise 2: List All Names
|
||
|
||
```sh
|
||
cat example2.json | jq '.[].name'
|
||
```
|
||
|
||
### Exercise 3: Calculate Total Stock
|
||
|
||
```sh
|
||
cat example3.json | jq '.stock | .warehouse + .retail'
|
||
```
|
||
|
||
### Exercise 4: Add a New Tag "sale" to the Tags Array
|
||
|
||
```sh
|
||
cat example3.json | jq '.tags += ["sale"]'
|
||
```
|
||
|
||
## Common Uses of `jq`
|
||
|
||
### Parsing API Responses
|
||
|
||
When interacting with web APIs, the responses are often in JSON format. `jq` allows you to parse, filter, and extract the necessary data from these responses.
|
||
|
||
```sh
|
||
curl -s https://api.example.com/data | jq '.items[] | {name: .name, id: .id}'
|
||
```
|
||
|
||
### Processing Configuration Files
|
||
|
||
Many modern applications use JSON for configuration. With `jq`, you can easily modify or extract values from these files.
|
||
|
||
```sh
|
||
jq '.settings.debug = true' config.json > new_config.json
|
||
```
|
||
|
||
### Log Analysis
|
||
|
||
If your logs are in JSON format, you can use `jq` to search for specific entries, aggregate data, or transform the logs into a more readable format.
|
||
|
||
```sh
|
||
cat logs.json | jq '.[] | select(.level == "error") | {timestamp: .timestamp, message: .message}'
|
||
```
|
||
|
||
### Data Transformation
|
||
|
||
Transforming JSON data into different structures or formats is straightforward with `jq`. This is useful for data pipelines or ETL (Extract, Transform, Load) processes.
|
||
|
||
```sh
|
||
cat data.json | jq '[.items[] | {name: .name, value: .value}]'
|
||
```
|
||
|
||
### Scripting and Automation
|
||
|
||
In shell scripts, `jq` can be used to parse and manipulate JSON data as part of automation tasks.
|
||
|
||
```sh
|
||
# Extracting a value from JSON in a script
|
||
response=$(curl -s https://api.example.com/data)
|
||
id=$(echo $response | jq -r '.items[0].id')
|
||
echo "The ID is: $id"
|
||
```
|
||
|
||
### Testing and Debugging
|
||
|
||
When developing applications that produce or consume JSON, `jq` helps in quickly inspecting the JSON output for correctness.
|
||
|
||
```sh
|
||
cat response.json | jq '.'
|
||
```
|
||
|
||
## Practical Scenarios
|
||
|
||
### Working with Kubernetes
|
||
|
||
Kubernetes uses JSON and YAML extensively. You can use `jq` to filter and extract information from the JSON output of `kubectl` commands.
|
||
|
||
```sh
|
||
kubectl get pods -o json | jq '.items[] | {name: .metadata.name, status: .status.phase}'
|
||
```
|
||
|
||
### CI/CD Pipelines
|
||
|
||
In continuous integration and deployment workflows, `jq` can parse and transform JSON data used in configuration files, reports, or environment variables.
|
||
|
||
```sh
|
||
echo $GITHUB_EVENT_PATH | jq '.commits[] | {message: .message, author: .author.name}'
|
||
```
|
||
|
||
### Web Development
|
||
|
||
When dealing with front-end and back-end integration, `jq` helps in simulating API responses or transforming data formats.
|
||
|
||
```sh
|
||
cat mock_response.json | jq '.users[] | {username: .login, email: .email}'
|
||
```
|
||
|
||
### Data Analysis
|
||
|
||
For quick analysis of JSON data files, `jq` provides a powerful way to query and aggregate data.
|
||
|
||
```sh
|
||
cat data.json | jq '[.records[] | select(.active == true) | .value] | add'
|
||
```
|
||
|
||
### DevOps and Infrastructure Management
|
||
|
||
Tools like Terraform and AWS CLI produce JSON output, and `jq` is perfect for extracting and processing this information.
|
||
|
||
```sh
|
||
aws ec2 describe-instances | jq '.Reservations[].Instances[] | {instanceId: .InstanceId, state: .State.Name}'
|
||
```
|
||
|
||
## Conclusion
|
||
|
||
`jq` is a versatile tool that can be integrated into various workflows to handle JSON data efficiently. Whether you're working with APIs, configuration files, logs, or automation scripts, `jq` helps you parse, filter, and transform JSON data with ease.
|
||
|
||
Feel free to modify these examples and try different commands. `jq` has a comprehensive manual that you can refer to for more advanced features:
|
||
|
||
```sh
|
||
man jq
|
||
```
|
||
|
||
Happy learning! If you have any specific questions or need further assistance with `jq`, let me know!
|
||
|
||
---
|
||
|
||
# Learning `jq` for Command-Line JSON Processing
|
||
|
||
`jq` is a powerful command-line JSON processor that lets you parse, filter, and transform JSON data. Here's a comprehensive guide to get you started:
|
||
|
||
## Installation
|
||
|
||
Most Linux distributions and macOS can install it via package managers:
|
||
|
||
```bash
|
||
# Ubuntu/Debian
|
||
sudo apt install jq
|
||
|
||
# CentOS/RHEL
|
||
sudo yum install jq
|
||
|
||
# macOS (using Homebrew)
|
||
brew install jq
|
||
|
||
# Windows (via Chocolatey)
|
||
choco install jq
|
||
```
|
||
|
||
## Basic Usage
|
||
|
||
```bash
|
||
# Basic pretty-printing
|
||
jq '.' file.json
|
||
|
||
# Read from stdin
|
||
curl -s https://api.example.com/data | jq '.'
|
||
```
|
||
|
||
## Selecting Data
|
||
|
||
```bash
|
||
# Get a specific field
|
||
jq '.field' file.json
|
||
|
||
# Get nested fields
|
||
jq '.parent.child.grandchild' file.json
|
||
|
||
# Get array elements
|
||
jq '.array[0]' file.json # First element
|
||
jq '.array[-1]' file.json # Last element
|
||
jq '.array[2:5]' file.json # Slice from index 2 to 5
|
||
```
|
||
|
||
## Common Operations
|
||
|
||
```bash
|
||
# Get multiple fields
|
||
jq '{name: .name, age: .age}' file.json
|
||
|
||
# Filter arrays
|
||
jq '.users[] | select(.age > 30)' file.json
|
||
|
||
# Map operations
|
||
jq '.numbers[] | . * 2' file.json
|
||
|
||
# Sort
|
||
jq '.users | sort_by(.age)' file.json
|
||
|
||
# Length/count
|
||
jq '.array | length' file.json
|
||
```
|
||
|
||
## Advanced Features
|
||
|
||
```bash
|
||
# String interpolation
|
||
jq '"Hello, \(.name)!"' file.json
|
||
|
||
# Conditional logic
|
||
jq 'if .age > 18 then "Adult" else "Minor" end' file.json
|
||
|
||
# Variables
|
||
jq '. as $item | $item.name' file.json
|
||
|
||
# Custom functions
|
||
jq 'def add(x; y): x + y; add(5; 10)' <<< '{}'
|
||
```
|
||
|
||
## Practical Examples
|
||
|
||
```bash
|
||
# Extract all email addresses from JSON
|
||
jq -r '.users[].email' file.json
|
||
|
||
# Convert JSON to CSV
|
||
jq -r '.users[] | [.name, .email, .age] | @csv' file.json
|
||
|
||
# Sum all values in an array
|
||
jq '[.numbers[]] | add' file.json
|
||
|
||
# Find unique values
|
||
jq '.tags[]' file.json | sort | uniq
|
||
|
||
# Modify JSON structure
|
||
jq '{user: .name, contact: {email: .email, phone: .tel}}' file.json
|
||
```
|
||
|
||
## Tips
|
||
|
||
1. Use `-r` for raw output (no quotes around strings)
|
||
2. Combine with `curl` for API responses: `curl -s ... | jq ...`
|
||
3. Use `//` for default values: `jq '.name // "Anonymous"'`
|
||
4. For large files, use `--stream` for iterative parsing |