Update tech_docs/python/json_python.md

This commit is contained in:
2024-06-26 06:24:51 +00:00
parent 61da81f226
commit f92a9db403

View File

@@ -2,6 +2,287 @@
## Introduction ## Introduction
JSON (JavaScript Object Notation) is a widely-used data interchange format that is easy to read and write for humans and easy to parse and generate for machines. Python provides several ways to work with JSON, from its built-in `json` library to more advanced external libraries for specific use cases.
## Built-in `json` Library
### Basic Operations
#### Encoding (Serialization)
Serialization converts a Python object into a JSON string.
```python
import json
data = {
"name": "John Doe",
"age": 30,
"isEmployed": True,
"skills": ["Python", "Machine Learning", "Web Development"]
}
# Convert Python object to JSON string
json_string = json.dumps(data, indent=4)
print(json_string)
# Write JSON data to a file
with open('data.json', 'w') as file:
json.dump(data, file, indent=4)
```
#### Decoding (Deserialization)
Deserialization converts a JSON string back into a Python object.
```python
# Convert JSON string to Python object
json_string = '{"name": "Jane Doe", "age": 25, "isEmployed": false}'
python_object = json.loads(json_string)
print(python_object)
# Read JSON data from a file
with open('data.json', 'r') as file:
python_object = json.load(file)
print(python_object)
```
### Custom Object Encoding and Decoding
The `json` library can be extended to encode and decode custom Python objects.
#### Encoding Custom Objects
```python
class User:
def __init__(self, name, age):
self.name = name
self.age = age
def encode_user(obj):
if isinstance(obj, User):
return {"name": obj.name, "age": obj.age, "__User__": True}
return obj
user = User("John Doe", 30)
json_string = json.dumps(user, default=encode_user)
print(json_string)
```
#### Decoding JSON into Custom Python Objects
```python
def decode_user(dct):
if "__User__" in dct:
return User(dct["name"], dct["age"])
return dct
user = json.loads(json_string, object_hook=decode_user)
print(user.name, user.age)
```
### Querying JSON
#### Accessing Nested Elements
```python
json_data = '''
{
"person": {
"name": "John Doe",
"address": {
"street": "123 Main St",
"city": "Anytown",
"zipcode": "12345"
},
"phone_numbers": [
{"type": "home", "number": "555-1234"},
{"type": "work", "number": "555-5678"}
]
}
}
'''
data = json.loads(json_data)
print(data['person']['name']) # Output: John Doe
print(data['person']['address']['city']) # Output: Anytown
print(data['person']['phone_numbers'][0]['number']) # Output: 555-1234
```
#### Iterating Over Lists
```python
for phone in data['person']['phone_numbers']:
print(f"{phone['type'].capitalize()} phone: {phone['number']}")
```
#### Filtering Data
```python
json_data = '''
{
"products": [
{"name": "Apple", "price": 0.5, "category": "Fruit"},
{"name": "Bread", "price": 2.5, "category": "Bakery"},
{"name": "Cheese", "price": 5.0, "category": "Dairy"},
{"name": "Milk", "price": 3.0, "category": "Dairy"}
]
}
'''
data = json.loads(json_data)
expensive_products = [product for product in data['products'] if product['price'] > 2]
print("Expensive products:", [product['name'] for product in expensive_products])
```
#### Transforming Data
```python
discounted_products = [
{**product, 'discounted_price': product['price'] * 0.9}
for product in data['products']
]
print(json.dumps(discounted_products, indent=2))
```
#### Aggregating Data
```python
total_value = sum(product['price'] for product in data['products'])
print(f"Total value of all products: ${total_value:.2f}")
```
#### Searching for Specific Items
```python
product_3_dollars = next((product for product in data['products'] if product['price'] == 3.0), None)
print("First $3 product:", product_3_dollars['name'] if product_3_dollars else "Not found")
```
#### Handling Missing Keys
```python
for product in data['products']:
print(f"Product: {product.get('name', 'Unnamed')}, "
f"Price: ${product.get('price', 0):.2f}, "
f"Stock: {product.get('stock', 'Unknown')}")
```
## External Libraries
### `pandas`
`pandas` is a powerful data manipulation library that can read JSON into DataFrames, making it easier to manipulate and analyze large datasets.
#### Reading JSON into a DataFrame
```python
import pandas as pd
json_data = '''
[
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "Los Angeles"},
{"name": "Charlie", "age": 35, "city": "Chicago"}
]
'''
df = pd.read_json(json_data)
print(df)
```
#### Writing DataFrame to JSON
```python
df.to_json('output.json', orient='records', indent=4)
```
### `jsonschema`
`jsonschema` is used for validating JSON data against a schema.
#### Validating JSON Data
```python
from jsonschema import validate
from jsonschema.exceptions import ValidationError
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"city": {"type": "string"}
},
"required": ["name", "age"]
}
data = {
"name": "Alice",
"age": 30,
"city": "New York"
}
try:
validate(instance=data, schema=schema)
print("Valid JSON data")
except ValidationError as e:
print(f"Invalid JSON data: {e}")
```
### `requests`
`requests` is a library for making HTTP requests, commonly used to fetch JSON data from APIs.
#### Fetching JSON Data from an API
```python
import requests
response = requests.get('https://api.example.com/data')
if response.status_code == 200:
data = response.json()
print(data)
else:
print(f"Failed to retrieve data: {response.status_code}")
```
## Use Cases
- **Configuration Files**: JSON is often used to store configuration settings for applications. Its human-readable format makes it easy to update and manage settings.
- **Data Interchange**: JSON is a common format for data exchange between servers and web applications, especially in RESTful APIs.
- **Storing and Retrieving Data**: JSON can be used to store data persistently in files, which can be later retrieved and processed.
## Best Practices
- **Handling Exceptions**: Always handle exceptions when parsing JSON to manage malformed data gracefully.
```python
try:
data = json.loads(malformed_json_string)
except json.JSONDecodeError as e:
print(f"Error decoding JSON: {e}")
```
- **Security Considerations**: Be cautious when deserializing JSON from untrusted sources to avoid security vulnerabilities.
- **Pretty Printing**: Use the `indent` parameter in `json.dumps()` or `json.dump()` for pretty printing, making JSON data easier to read and debug.
```python
json_string = json.dumps(data, indent=4)
print(json_string)
```
---
By leveraging these tools and techniques, you can efficiently work with JSON data in Python, covering a wide range of use cases from basic serialization and deserialization to advanced data manipulation and validation. This guide serves as a comprehensive reference for your JSON handling needs in Python.
---
# Comprehensive Guide to Working with JSON in Python
## Introduction
JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write and easy for machines to parse and generate. Python's built-in `json` library provides tools to work with JSON data, allowing you to serialize and deserialize Python objects. JSON (JavaScript Object Notation) is a lightweight data interchange format that's easy for humans to read and write and easy for machines to parse and generate. Python's built-in `json` library provides tools to work with JSON data, allowing you to serialize and deserialize Python objects.
## Basic Operations ## Basic Operations