Update tech_docs/database/sql_getting_started.md
This commit is contained in:
@@ -1,3 +1,203 @@
|
|||||||
|
Here's a structured **Technical Guide & Roadmap for Forex Tick Data Analysis with SQL**, designed as a progressive learning path with clear milestones and reference examples:
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
# **Forex Tick Data Analysis: SQL Learning Roadmap**
|
||||||
|
*A step-by-step guide from beginner to advanced techniques*
|
||||||
|
|
||||||
|
## **Phase 1: Foundations**
|
||||||
|
### **1.1 Understanding Your Data**
|
||||||
|
- **Structure**: Forex ticks typically contain:
|
||||||
|
```sql
|
||||||
|
symbol (e.g., 'EUR/USD'),
|
||||||
|
timestamp (precision to milliseconds),
|
||||||
|
bid price,
|
||||||
|
ask price,
|
||||||
|
volume
|
||||||
|
```
|
||||||
|
- **Key Metrics**:
|
||||||
|
- **Spread**: `ask - bid` (liquidity measure)
|
||||||
|
- **Mid-price**: `(bid + ask) / 2` (reference price)
|
||||||
|
|
||||||
|
### **1.2 Basic SQL Operations**
|
||||||
|
```sql
|
||||||
|
-- Sample data inspection
|
||||||
|
SELECT * FROM forex_ticks
|
||||||
|
WHERE symbol = 'EUR/USD'
|
||||||
|
LIMIT 100;
|
||||||
|
|
||||||
|
-- Count ticks per pair
|
||||||
|
SELECT symbol, COUNT(*)
|
||||||
|
FROM forex_ticks
|
||||||
|
GROUP BY symbol;
|
||||||
|
|
||||||
|
-- Time range filtering
|
||||||
|
SELECT MIN(timestamp), MAX(timestamp)
|
||||||
|
FROM forex_ticks;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **Phase 2: Core Analysis**
|
||||||
|
### **2.1 Spread Analysis**
|
||||||
|
```sql
|
||||||
|
-- Basic spread stats
|
||||||
|
SELECT
|
||||||
|
symbol,
|
||||||
|
AVG(ask - bid) AS avg_spread,
|
||||||
|
MAX(ask - bid) AS max_spread
|
||||||
|
FROM forex_ticks
|
||||||
|
GROUP BY symbol;
|
||||||
|
```
|
||||||
|
|
||||||
|
### **2.2 Time Bucketing**
|
||||||
|
```sql
|
||||||
|
-- 5-minute candlesticks
|
||||||
|
SELECT
|
||||||
|
symbol,
|
||||||
|
DATE_TRUNC('5 minutes', timestamp) AS time_bucket,
|
||||||
|
MIN(bid) AS low,
|
||||||
|
MAX(ask) AS high,
|
||||||
|
AVG((bid+ask)/2) AS close
|
||||||
|
FROM forex_ticks
|
||||||
|
GROUP BY symbol, time_bucket;
|
||||||
|
```
|
||||||
|
|
||||||
|
### **2.3 Session Analysis**
|
||||||
|
```sql
|
||||||
|
-- Volume by hour (GMT)
|
||||||
|
SELECT
|
||||||
|
EXTRACT(HOUR FROM timestamp) AS hour,
|
||||||
|
AVG(volume) AS avg_volume
|
||||||
|
FROM forex_ticks
|
||||||
|
WHERE symbol = 'GBP/USD'
|
||||||
|
GROUP BY hour
|
||||||
|
ORDER BY hour;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **Phase 3: Intermediate Techniques**
|
||||||
|
### **3.1 Rolling Calculations**
|
||||||
|
```sql
|
||||||
|
-- 30-minute moving average
|
||||||
|
SELECT
|
||||||
|
timestamp,
|
||||||
|
symbol,
|
||||||
|
AVG((bid+ask)/2) OVER (
|
||||||
|
PARTITION BY symbol
|
||||||
|
ORDER BY timestamp
|
||||||
|
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
|
||||||
|
) AS 30min_MA
|
||||||
|
FROM forex_ticks;
|
||||||
|
```
|
||||||
|
|
||||||
|
### **3.2 Pair Correlation**
|
||||||
|
```sql
|
||||||
|
WITH hourly_prices AS (
|
||||||
|
SELECT
|
||||||
|
DATE_TRUNC('hour', timestamp) AS hour,
|
||||||
|
symbol,
|
||||||
|
AVG((bid+ask)/2) AS mid_price
|
||||||
|
FROM forex_ticks
|
||||||
|
GROUP BY hour, symbol
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
a.symbol AS pair1,
|
||||||
|
b.symbol AS pair2,
|
||||||
|
CORR(a.mid_price, b.mid_price) AS correlation
|
||||||
|
FROM hourly_prices a
|
||||||
|
JOIN hourly_prices b ON a.hour = b.hour
|
||||||
|
WHERE a.symbol < b.symbol
|
||||||
|
GROUP BY pair1, pair2;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **Phase 4: Advanced Topics**
|
||||||
|
### **4.1 Volatility Measurement**
|
||||||
|
```sql
|
||||||
|
WITH returns AS (
|
||||||
|
SELECT
|
||||||
|
symbol,
|
||||||
|
timestamp,
|
||||||
|
(ask - LAG(ask) OVER (PARTITION BY symbol ORDER BY timestamp)) /
|
||||||
|
LAG(ask) OVER (PARTITION BY symbol ORDER BY timestamp) AS return
|
||||||
|
FROM forex_ticks
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
symbol,
|
||||||
|
STDDEV(return) AS hourly_volatility
|
||||||
|
FROM returns
|
||||||
|
GROUP BY symbol;
|
||||||
|
```
|
||||||
|
|
||||||
|
### **4.2 Event Impact Analysis**
|
||||||
|
```sql
|
||||||
|
-- Compare 15-min pre/post NFP release
|
||||||
|
SELECT
|
||||||
|
AVG(CASE WHEN timestamp BETWEEN '2023-12-01 13:30' AND '2023-12-01 13:45'
|
||||||
|
THEN (bid+ask)/2 END) AS post_NFP,
|
||||||
|
AVG(CASE WHEN timestamp BETWEEN '2023-12-01 13:15' AND '2023-12-01 13:30'
|
||||||
|
THEN (bid+ask)/2 END) AS pre_NFP
|
||||||
|
FROM forex_ticks
|
||||||
|
WHERE symbol = 'EUR/USD';
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **Study Roadmap**
|
||||||
|
### **Weekly Learning Plan**
|
||||||
|
| Week | Focus Area | Key Skills |
|
||||||
|
|------|-----------|------------|
|
||||||
|
| 1 | SQL Basics | `SELECT`, `WHERE`, `GROUP BY` |
|
||||||
|
| 2 | Time Handling | `DATE_TRUNC`, `EXTRACT`, timezones |
|
||||||
|
| 3 | Aggregations | `AVG`, `STDDEV`, `CORR` |
|
||||||
|
| 4 | Window Functions | `OVER`, `PARTITION BY`, rolling calcs |
|
||||||
|
| 5 | Optimization | Indexes, query planning |
|
||||||
|
| 6 | Advanced Patterns | Volatility modeling, microstructure |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **Cheat Sheet**
|
||||||
|
### **Essential Functions**
|
||||||
|
| Function | Purpose | Example |
|
||||||
|
|----------|---------|---------|
|
||||||
|
| `DATE_TRUNC` | Bucket timestamps | `DATE_TRUNC('hour', timestamp)` |
|
||||||
|
| `EXTRACT` | Get time parts | `EXTRACT(HOUR FROM timestamp)` |
|
||||||
|
| `CORR` | Correlation | `CORR(price1, price2)` |
|
||||||
|
| `AVG() OVER` | Moving average | `AVG(price) OVER (ORDER BY time ROWS 30 PRECEDING)` |
|
||||||
|
|
||||||
|
### **Common Patterns**
|
||||||
|
```sql
|
||||||
|
-- Get latest price per pair
|
||||||
|
SELECT DISTINCT ON (symbol) symbol, bid, ask
|
||||||
|
FROM forex_ticks
|
||||||
|
ORDER BY symbol, timestamp DESC;
|
||||||
|
|
||||||
|
-- Detect stale data
|
||||||
|
SELECT symbol, MAX(timestamp) AS last_update
|
||||||
|
FROM forex_ticks
|
||||||
|
GROUP BY symbol
|
||||||
|
HAVING MAX(timestamp) < NOW() - INTERVAL '5 minutes';
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## **Next Steps**
|
||||||
|
1. **Practice Dataset**: Download free forex tick data from [Dukascopy Bank](https://www.dukascopy.com/)
|
||||||
|
2. **Sandbox Setup**: Install PostgreSQL + TimescaleDB for time-series optimizations
|
||||||
|
3. **Projects**:
|
||||||
|
- Build a volatility dashboard
|
||||||
|
- Analyze London vs. NY session spreads
|
||||||
|
- Track correlation breakdowns during crises
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Pro Tip**: Bookmark this guide and revisit each phase as your skills progress. Start with Phase 1 queries, then gradually incorporate more complex techniques.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
# **The Ultimate SQL Getting Started Guide**
|
# **The Ultimate SQL Getting Started Guide**
|
||||||
|
|
||||||
This guide will take you from absolute beginner to SQL proficiency, with a focus on practical data analysis and EDA applications.
|
This guide will take you from absolute beginner to SQL proficiency, with a focus on practical data analysis and EDA applications.
|
||||||
|
|||||||
Reference in New Issue
Block a user