Update tech_docs/database/sql_getting_started.md

This commit is contained in:
2025-06-18 04:41:48 +00:00
parent ae21e7227f
commit 02b1de949e

View File

@@ -1,3 +1,203 @@
Here's a structured **Technical Guide & Roadmap for Forex Tick Data Analysis with SQL**, designed as a progressive learning path with clear milestones and reference examples:
---
# **Forex Tick Data Analysis: SQL Learning Roadmap**
*A step-by-step guide from beginner to advanced techniques*
## **Phase 1: Foundations**
### **1.1 Understanding Your Data**
- **Structure**: Forex ticks typically contain:
```sql
symbol (e.g., 'EUR/USD'),
timestamp (precision to milliseconds),
bid price,
ask price,
volume
```
- **Key Metrics**:
- **Spread**: `ask - bid` (liquidity measure)
- **Mid-price**: `(bid + ask) / 2` (reference price)
### **1.2 Basic SQL Operations**
```sql
-- Sample data inspection
SELECT * FROM forex_ticks
WHERE symbol = 'EUR/USD'
LIMIT 100;
-- Count ticks per pair
SELECT symbol, COUNT(*)
FROM forex_ticks
GROUP BY symbol;
-- Time range filtering
SELECT MIN(timestamp), MAX(timestamp)
FROM forex_ticks;
```
---
## **Phase 2: Core Analysis**
### **2.1 Spread Analysis**
```sql
-- Basic spread stats
SELECT
symbol,
AVG(ask - bid) AS avg_spread,
MAX(ask - bid) AS max_spread
FROM forex_ticks
GROUP BY symbol;
```
### **2.2 Time Bucketing**
```sql
-- 5-minute candlesticks
SELECT
symbol,
DATE_TRUNC('5 minutes', timestamp) AS time_bucket,
MIN(bid) AS low,
MAX(ask) AS high,
AVG((bid+ask)/2) AS close
FROM forex_ticks
GROUP BY symbol, time_bucket;
```
### **2.3 Session Analysis**
```sql
-- Volume by hour (GMT)
SELECT
EXTRACT(HOUR FROM timestamp) AS hour,
AVG(volume) AS avg_volume
FROM forex_ticks
WHERE symbol = 'GBP/USD'
GROUP BY hour
ORDER BY hour;
```
---
## **Phase 3: Intermediate Techniques**
### **3.1 Rolling Calculations**
```sql
-- 30-minute moving average
SELECT
timestamp,
symbol,
AVG((bid+ask)/2) OVER (
PARTITION BY symbol
ORDER BY timestamp
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS 30min_MA
FROM forex_ticks;
```
### **3.2 Pair Correlation**
```sql
WITH hourly_prices AS (
SELECT
DATE_TRUNC('hour', timestamp) AS hour,
symbol,
AVG((bid+ask)/2) AS mid_price
FROM forex_ticks
GROUP BY hour, symbol
)
SELECT
a.symbol AS pair1,
b.symbol AS pair2,
CORR(a.mid_price, b.mid_price) AS correlation
FROM hourly_prices a
JOIN hourly_prices b ON a.hour = b.hour
WHERE a.symbol < b.symbol
GROUP BY pair1, pair2;
```
---
## **Phase 4: Advanced Topics**
### **4.1 Volatility Measurement**
```sql
WITH returns AS (
SELECT
symbol,
timestamp,
(ask - LAG(ask) OVER (PARTITION BY symbol ORDER BY timestamp)) /
LAG(ask) OVER (PARTITION BY symbol ORDER BY timestamp) AS return
FROM forex_ticks
)
SELECT
symbol,
STDDEV(return) AS hourly_volatility
FROM returns
GROUP BY symbol;
```
### **4.2 Event Impact Analysis**
```sql
-- Compare 15-min pre/post NFP release
SELECT
AVG(CASE WHEN timestamp BETWEEN '2023-12-01 13:30' AND '2023-12-01 13:45'
THEN (bid+ask)/2 END) AS post_NFP,
AVG(CASE WHEN timestamp BETWEEN '2023-12-01 13:15' AND '2023-12-01 13:30'
THEN (bid+ask)/2 END) AS pre_NFP
FROM forex_ticks
WHERE symbol = 'EUR/USD';
```
---
## **Study Roadmap**
### **Weekly Learning Plan**
| Week | Focus Area | Key Skills |
|------|-----------|------------|
| 1 | SQL Basics | `SELECT`, `WHERE`, `GROUP BY` |
| 2 | Time Handling | `DATE_TRUNC`, `EXTRACT`, timezones |
| 3 | Aggregations | `AVG`, `STDDEV`, `CORR` |
| 4 | Window Functions | `OVER`, `PARTITION BY`, rolling calcs |
| 5 | Optimization | Indexes, query planning |
| 6 | Advanced Patterns | Volatility modeling, microstructure |
---
## **Cheat Sheet**
### **Essential Functions**
| Function | Purpose | Example |
|----------|---------|---------|
| `DATE_TRUNC` | Bucket timestamps | `DATE_TRUNC('hour', timestamp)` |
| `EXTRACT` | Get time parts | `EXTRACT(HOUR FROM timestamp)` |
| `CORR` | Correlation | `CORR(price1, price2)` |
| `AVG() OVER` | Moving average | `AVG(price) OVER (ORDER BY time ROWS 30 PRECEDING)` |
### **Common Patterns**
```sql
-- Get latest price per pair
SELECT DISTINCT ON (symbol) symbol, bid, ask
FROM forex_ticks
ORDER BY symbol, timestamp DESC;
-- Detect stale data
SELECT symbol, MAX(timestamp) AS last_update
FROM forex_ticks
GROUP BY symbol
HAVING MAX(timestamp) < NOW() - INTERVAL '5 minutes';
```
---
## **Next Steps**
1. **Practice Dataset**: Download free forex tick data from [Dukascopy Bank](https://www.dukascopy.com/)
2. **Sandbox Setup**: Install PostgreSQL + TimescaleDB for time-series optimizations
3. **Projects**:
- Build a volatility dashboard
- Analyze London vs. NY session spreads
- Track correlation breakdowns during crises
---
**Pro Tip**: Bookmark this guide and revisit each phase as your skills progress. Start with Phase 1 queries, then gradually incorporate more complex techniques.
---
# **The Ultimate SQL Getting Started Guide** # **The Ultimate SQL Getting Started Guide**
This guide will take you from absolute beginner to SQL proficiency, with a focus on practical data analysis and EDA applications. This guide will take you from absolute beginner to SQL proficiency, with a focus on practical data analysis and EDA applications.