Update tech_docs/database/sql_getting_started.md
This commit is contained in:
@@ -1,3 +1,203 @@
|
||||
Here's a structured **Technical Guide & Roadmap for Forex Tick Data Analysis with SQL**, designed as a progressive learning path with clear milestones and reference examples:
|
||||
|
||||
---
|
||||
|
||||
# **Forex Tick Data Analysis: SQL Learning Roadmap**
|
||||
*A step-by-step guide from beginner to advanced techniques*
|
||||
|
||||
## **Phase 1: Foundations**
|
||||
### **1.1 Understanding Your Data**
|
||||
- **Structure**: Forex ticks typically contain:
|
||||
```sql
|
||||
symbol (e.g., 'EUR/USD'),
|
||||
timestamp (precision to milliseconds),
|
||||
bid price,
|
||||
ask price,
|
||||
volume
|
||||
```
|
||||
- **Key Metrics**:
|
||||
- **Spread**: `ask - bid` (liquidity measure)
|
||||
- **Mid-price**: `(bid + ask) / 2` (reference price)
|
||||
|
||||
### **1.2 Basic SQL Operations**
|
||||
```sql
|
||||
-- Sample data inspection
|
||||
SELECT * FROM forex_ticks
|
||||
WHERE symbol = 'EUR/USD'
|
||||
LIMIT 100;
|
||||
|
||||
-- Count ticks per pair
|
||||
SELECT symbol, COUNT(*)
|
||||
FROM forex_ticks
|
||||
GROUP BY symbol;
|
||||
|
||||
-- Time range filtering
|
||||
SELECT MIN(timestamp), MAX(timestamp)
|
||||
FROM forex_ticks;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Phase 2: Core Analysis**
|
||||
### **2.1 Spread Analysis**
|
||||
```sql
|
||||
-- Basic spread stats
|
||||
SELECT
|
||||
symbol,
|
||||
AVG(ask - bid) AS avg_spread,
|
||||
MAX(ask - bid) AS max_spread
|
||||
FROM forex_ticks
|
||||
GROUP BY symbol;
|
||||
```
|
||||
|
||||
### **2.2 Time Bucketing**
|
||||
```sql
|
||||
-- 5-minute candlesticks
|
||||
SELECT
|
||||
symbol,
|
||||
DATE_TRUNC('5 minutes', timestamp) AS time_bucket,
|
||||
MIN(bid) AS low,
|
||||
MAX(ask) AS high,
|
||||
AVG((bid+ask)/2) AS close
|
||||
FROM forex_ticks
|
||||
GROUP BY symbol, time_bucket;
|
||||
```
|
||||
|
||||
### **2.3 Session Analysis**
|
||||
```sql
|
||||
-- Volume by hour (GMT)
|
||||
SELECT
|
||||
EXTRACT(HOUR FROM timestamp) AS hour,
|
||||
AVG(volume) AS avg_volume
|
||||
FROM forex_ticks
|
||||
WHERE symbol = 'GBP/USD'
|
||||
GROUP BY hour
|
||||
ORDER BY hour;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Phase 3: Intermediate Techniques**
|
||||
### **3.1 Rolling Calculations**
|
||||
```sql
|
||||
-- 30-minute moving average
|
||||
SELECT
|
||||
timestamp,
|
||||
symbol,
|
||||
AVG((bid+ask)/2) OVER (
|
||||
PARTITION BY symbol
|
||||
ORDER BY timestamp
|
||||
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
|
||||
) AS 30min_MA
|
||||
FROM forex_ticks;
|
||||
```
|
||||
|
||||
### **3.2 Pair Correlation**
|
||||
```sql
|
||||
WITH hourly_prices AS (
|
||||
SELECT
|
||||
DATE_TRUNC('hour', timestamp) AS hour,
|
||||
symbol,
|
||||
AVG((bid+ask)/2) AS mid_price
|
||||
FROM forex_ticks
|
||||
GROUP BY hour, symbol
|
||||
)
|
||||
SELECT
|
||||
a.symbol AS pair1,
|
||||
b.symbol AS pair2,
|
||||
CORR(a.mid_price, b.mid_price) AS correlation
|
||||
FROM hourly_prices a
|
||||
JOIN hourly_prices b ON a.hour = b.hour
|
||||
WHERE a.symbol < b.symbol
|
||||
GROUP BY pair1, pair2;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Phase 4: Advanced Topics**
|
||||
### **4.1 Volatility Measurement**
|
||||
```sql
|
||||
WITH returns AS (
|
||||
SELECT
|
||||
symbol,
|
||||
timestamp,
|
||||
(ask - LAG(ask) OVER (PARTITION BY symbol ORDER BY timestamp)) /
|
||||
LAG(ask) OVER (PARTITION BY symbol ORDER BY timestamp) AS return
|
||||
FROM forex_ticks
|
||||
)
|
||||
SELECT
|
||||
symbol,
|
||||
STDDEV(return) AS hourly_volatility
|
||||
FROM returns
|
||||
GROUP BY symbol;
|
||||
```
|
||||
|
||||
### **4.2 Event Impact Analysis**
|
||||
```sql
|
||||
-- Compare 15-min pre/post NFP release
|
||||
SELECT
|
||||
AVG(CASE WHEN timestamp BETWEEN '2023-12-01 13:30' AND '2023-12-01 13:45'
|
||||
THEN (bid+ask)/2 END) AS post_NFP,
|
||||
AVG(CASE WHEN timestamp BETWEEN '2023-12-01 13:15' AND '2023-12-01 13:30'
|
||||
THEN (bid+ask)/2 END) AS pre_NFP
|
||||
FROM forex_ticks
|
||||
WHERE symbol = 'EUR/USD';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Study Roadmap**
|
||||
### **Weekly Learning Plan**
|
||||
| Week | Focus Area | Key Skills |
|
||||
|------|-----------|------------|
|
||||
| 1 | SQL Basics | `SELECT`, `WHERE`, `GROUP BY` |
|
||||
| 2 | Time Handling | `DATE_TRUNC`, `EXTRACT`, timezones |
|
||||
| 3 | Aggregations | `AVG`, `STDDEV`, `CORR` |
|
||||
| 4 | Window Functions | `OVER`, `PARTITION BY`, rolling calcs |
|
||||
| 5 | Optimization | Indexes, query planning |
|
||||
| 6 | Advanced Patterns | Volatility modeling, microstructure |
|
||||
|
||||
---
|
||||
|
||||
## **Cheat Sheet**
|
||||
### **Essential Functions**
|
||||
| Function | Purpose | Example |
|
||||
|----------|---------|---------|
|
||||
| `DATE_TRUNC` | Bucket timestamps | `DATE_TRUNC('hour', timestamp)` |
|
||||
| `EXTRACT` | Get time parts | `EXTRACT(HOUR FROM timestamp)` |
|
||||
| `CORR` | Correlation | `CORR(price1, price2)` |
|
||||
| `AVG() OVER` | Moving average | `AVG(price) OVER (ORDER BY time ROWS 30 PRECEDING)` |
|
||||
|
||||
### **Common Patterns**
|
||||
```sql
|
||||
-- Get latest price per pair
|
||||
SELECT DISTINCT ON (symbol) symbol, bid, ask
|
||||
FROM forex_ticks
|
||||
ORDER BY symbol, timestamp DESC;
|
||||
|
||||
-- Detect stale data
|
||||
SELECT symbol, MAX(timestamp) AS last_update
|
||||
FROM forex_ticks
|
||||
GROUP BY symbol
|
||||
HAVING MAX(timestamp) < NOW() - INTERVAL '5 minutes';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## **Next Steps**
|
||||
1. **Practice Dataset**: Download free forex tick data from [Dukascopy Bank](https://www.dukascopy.com/)
|
||||
2. **Sandbox Setup**: Install PostgreSQL + TimescaleDB for time-series optimizations
|
||||
3. **Projects**:
|
||||
- Build a volatility dashboard
|
||||
- Analyze London vs. NY session spreads
|
||||
- Track correlation breakdowns during crises
|
||||
|
||||
---
|
||||
|
||||
**Pro Tip**: Bookmark this guide and revisit each phase as your skills progress. Start with Phase 1 queries, then gradually incorporate more complex techniques.
|
||||
|
||||
---
|
||||
|
||||
# **The Ultimate SQL Getting Started Guide**
|
||||
|
||||
This guide will take you from absolute beginner to SQL proficiency, with a focus on practical data analysis and EDA applications.
|
||||
|
||||
Reference in New Issue
Block a user