From 02b1de949e5082bfd203d541c5c5859f41f4a349 Mon Sep 17 00:00:00 2001 From: medusa Date: Wed, 18 Jun 2025 04:41:48 +0000 Subject: [PATCH] Update tech_docs/database/sql_getting_started.md --- tech_docs/database/sql_getting_started.md | 200 ++++++++++++++++++++++ 1 file changed, 200 insertions(+) diff --git a/tech_docs/database/sql_getting_started.md b/tech_docs/database/sql_getting_started.md index 6443a33..e351d90 100644 --- a/tech_docs/database/sql_getting_started.md +++ b/tech_docs/database/sql_getting_started.md @@ -1,3 +1,203 @@ +Here's a structured **Technical Guide & Roadmap for Forex Tick Data Analysis with SQL**, designed as a progressive learning path with clear milestones and reference examples: + +--- + +# **Forex Tick Data Analysis: SQL Learning Roadmap** +*A step-by-step guide from beginner to advanced techniques* + +## **Phase 1: Foundations** +### **1.1 Understanding Your Data** +- **Structure**: Forex ticks typically contain: + ```sql + symbol (e.g., 'EUR/USD'), + timestamp (precision to milliseconds), + bid price, + ask price, + volume + ``` +- **Key Metrics**: + - **Spread**: `ask - bid` (liquidity measure) + - **Mid-price**: `(bid + ask) / 2` (reference price) + +### **1.2 Basic SQL Operations** +```sql +-- Sample data inspection +SELECT * FROM forex_ticks +WHERE symbol = 'EUR/USD' +LIMIT 100; + +-- Count ticks per pair +SELECT symbol, COUNT(*) +FROM forex_ticks +GROUP BY symbol; + +-- Time range filtering +SELECT MIN(timestamp), MAX(timestamp) +FROM forex_ticks; +``` + +--- + +## **Phase 2: Core Analysis** +### **2.1 Spread Analysis** +```sql +-- Basic spread stats +SELECT + symbol, + AVG(ask - bid) AS avg_spread, + MAX(ask - bid) AS max_spread +FROM forex_ticks +GROUP BY symbol; +``` + +### **2.2 Time Bucketing** +```sql +-- 5-minute candlesticks +SELECT + symbol, + DATE_TRUNC('5 minutes', timestamp) AS time_bucket, + MIN(bid) AS low, + MAX(ask) AS high, + AVG((bid+ask)/2) AS close +FROM forex_ticks +GROUP BY symbol, time_bucket; +``` + +### **2.3 Session Analysis** +```sql +-- Volume by hour (GMT) +SELECT + EXTRACT(HOUR FROM timestamp) AS hour, + AVG(volume) AS avg_volume +FROM forex_ticks +WHERE symbol = 'GBP/USD' +GROUP BY hour +ORDER BY hour; +``` + +--- + +## **Phase 3: Intermediate Techniques** +### **3.1 Rolling Calculations** +```sql +-- 30-minute moving average +SELECT + timestamp, + symbol, + AVG((bid+ask)/2) OVER ( + PARTITION BY symbol + ORDER BY timestamp + ROWS BETWEEN 29 PRECEDING AND CURRENT ROW + ) AS 30min_MA +FROM forex_ticks; +``` + +### **3.2 Pair Correlation** +```sql +WITH hourly_prices AS ( + SELECT + DATE_TRUNC('hour', timestamp) AS hour, + symbol, + AVG((bid+ask)/2) AS mid_price + FROM forex_ticks + GROUP BY hour, symbol +) +SELECT + a.symbol AS pair1, + b.symbol AS pair2, + CORR(a.mid_price, b.mid_price) AS correlation +FROM hourly_prices a +JOIN hourly_prices b ON a.hour = b.hour +WHERE a.symbol < b.symbol +GROUP BY pair1, pair2; +``` + +--- + +## **Phase 4: Advanced Topics** +### **4.1 Volatility Measurement** +```sql +WITH returns AS ( + SELECT + symbol, + timestamp, + (ask - LAG(ask) OVER (PARTITION BY symbol ORDER BY timestamp)) / + LAG(ask) OVER (PARTITION BY symbol ORDER BY timestamp) AS return + FROM forex_ticks +) +SELECT + symbol, + STDDEV(return) AS hourly_volatility +FROM returns +GROUP BY symbol; +``` + +### **4.2 Event Impact Analysis** +```sql +-- Compare 15-min pre/post NFP release +SELECT + AVG(CASE WHEN timestamp BETWEEN '2023-12-01 13:30' AND '2023-12-01 13:45' + THEN (bid+ask)/2 END) AS post_NFP, + AVG(CASE WHEN timestamp BETWEEN '2023-12-01 13:15' AND '2023-12-01 13:30' + THEN (bid+ask)/2 END) AS pre_NFP +FROM forex_ticks +WHERE symbol = 'EUR/USD'; +``` + +--- + +## **Study Roadmap** +### **Weekly Learning Plan** +| Week | Focus Area | Key Skills | +|------|-----------|------------| +| 1 | SQL Basics | `SELECT`, `WHERE`, `GROUP BY` | +| 2 | Time Handling | `DATE_TRUNC`, `EXTRACT`, timezones | +| 3 | Aggregations | `AVG`, `STDDEV`, `CORR` | +| 4 | Window Functions | `OVER`, `PARTITION BY`, rolling calcs | +| 5 | Optimization | Indexes, query planning | +| 6 | Advanced Patterns | Volatility modeling, microstructure | + +--- + +## **Cheat Sheet** +### **Essential Functions** +| Function | Purpose | Example | +|----------|---------|---------| +| `DATE_TRUNC` | Bucket timestamps | `DATE_TRUNC('hour', timestamp)` | +| `EXTRACT` | Get time parts | `EXTRACT(HOUR FROM timestamp)` | +| `CORR` | Correlation | `CORR(price1, price2)` | +| `AVG() OVER` | Moving average | `AVG(price) OVER (ORDER BY time ROWS 30 PRECEDING)` | + +### **Common Patterns** +```sql +-- Get latest price per pair +SELECT DISTINCT ON (symbol) symbol, bid, ask +FROM forex_ticks +ORDER BY symbol, timestamp DESC; + +-- Detect stale data +SELECT symbol, MAX(timestamp) AS last_update +FROM forex_ticks +GROUP BY symbol +HAVING MAX(timestamp) < NOW() - INTERVAL '5 minutes'; +``` + +--- + +## **Next Steps** +1. **Practice Dataset**: Download free forex tick data from [Dukascopy Bank](https://www.dukascopy.com/) +2. **Sandbox Setup**: Install PostgreSQL + TimescaleDB for time-series optimizations +3. **Projects**: + - Build a volatility dashboard + - Analyze London vs. NY session spreads + - Track correlation breakdowns during crises + +--- + +**Pro Tip**: Bookmark this guide and revisit each phase as your skills progress. Start with Phase 1 queries, then gradually incorporate more complex techniques. + +--- + # **The Ultimate SQL Getting Started Guide** This guide will take you from absolute beginner to SQL proficiency, with a focus on practical data analysis and EDA applications.