diff --git a/tech_docs/database/sql_getting_started.md b/tech_docs/database/sql_getting_started.md index 757ffbb..5ac2fad 100644 --- a/tech_docs/database/sql_getting_started.md +++ b/tech_docs/database/sql_getting_started.md @@ -1,3 +1,124 @@ +Understood! You're seeking **maximum SQL mastery with minimum time investment** by focusing on the critical 20% that delivers 80% of results in forex data analysis. Here's your **laser-focused SQL SME blueprint**: + +--- + +### **The SQL SME 20% Framework for Forex** +#### **1. Core Principles to Internalize** +1. **Forex Data is Time-Series First** + - Every query must handle timestamp filtering efficiently + - Key skill: `DATE_TRUNC()` + `WHERE` on time ranges + +2. **Bid/Ask Dynamics are Fundamental** + - Spread analysis unlocks liquidity insights + - Key skill: `(ask - bid)` calculations with proper grouping + +3. **Rolling Windows > Static Aggregates** + - Market conditions change constantly - analyze trends, not snapshots + - Key skill: `AVG() OVER(ROWS N PRECEDING)` + +--- + +### **2. The 10 Essential Patterns (Memorize These)** +| # | Pattern | Forex Application | Example | +|---|---------|-------------------|---------| +| 1 | **Time Bucketing** | Convert ticks → candles | `DATE_TRUNC('15 min', timestamp)` | +| 2 | **Rolling Volatility** | Measure risk | `STDDEV(price) OVER(ROWS 99 PRECEDING)` | +| 3 | **Session Comparison** | London vs. NY activity | `WHERE EXTRACT(HOUR FROM timestamp) IN (7,13)` | +| 4 | **Pair Correlation** | Hedge ratios | `CORR(eurusd, usdjpy)` | +| 5 | **Spread Analysis** | Liquidity monitoring | `AVG(ask - bid) GROUP BY symbol` | +| 6 | **Event Impact** | NFP/CPI reactions | `WHERE timestamp BETWEEN event-15min AND event+1H` | +| 7 | **Liquidity Zones** | Volume clusters | `NTILE(4) OVER(ORDER BY volume)` | +| 8 | **Outlier Detection** | Data quality checks | `WHERE price > 3*STDDEV() OVER()` | +| 9 | **Gap Analysis** | Weekend openings | `LAG(close) OVER() - open` | +| 10 | **Rolling Sharpe** | Strategy performance | `AVG(return)/STDDEV(return) OVER()` | + +--- + +### **3. SME-Level Documentation Template** +**For each pattern**, document: +1. **Business Purpose**: *"Identify optimal trading hours by comparing volatility across sessions"* +2. **Technical Implementation**: + ```sql + SELECT + EXTRACT(HOUR FROM timestamp) AS hour, + STDDEV((bid+ask)/2) AS volatility + FROM ticks + WHERE symbol = 'EUR/USD' + GROUP BY hour + ORDER BY volatility DESC + ``` +3. **Performance Considerations**: *"Add composite index on (symbol, timestamp) for 100x speedup"* +4. **Edge Cases**: *"Exclude holidays where volatility is artificially low"* + +--- + +### **4. Drills to Achieve Mastery** +#### **Daily Challenge (15 mins/day)** +- **Day 1**: Generate 1H candles with OHLC + volume +- **Day 2**: Calculate 30-period rolling correlation between EUR/USD and GBP/USD +- **Day 3**: Find days with spread > 2x 30-day average +- **Day 4**: Compare pre/post-FOMC volatility +- **Day 5**: Optimize a slow query using EXPLAIN ANALYZE + +#### **Weekly Project** +- Build a **volatility surface** showing: + ```sql + SELECT + symbol, + DATE_TRUNC('hour', timestamp) AS hour, + STDDEV((bid+ask)/2) AS vol, + AVG(ask-bid) AS spread + FROM ticks + GROUP BY symbol, hour + ``` + +--- + +### **5. Forensic Analysis Checklist** +When reviewing any forex query, ask: +1. **Time Handling**: + - ✅ Timestamps in UTC? + - ✅ Correct timezone conversions? +2. **Spread Awareness**: + - ✅ Using (bid+ask)/2 for mid-price? + - ✅ Calculating raw spread metrics? +3. **Rolling vs Static**: + - ✅ Using window functions where appropriate? +4. **Performance**: + - ✅ Indexes on (symbol, timestamp)? + - ✅ Avoiding full table scans? + +--- + +### **6. Pro-Grade Resources** +1. **Dukascopy Tick Data**: Real-world practice dataset +2. **PostgreSQL Window Functions Docs**: Master the OVER() clause +3. **Forex Economic Calendar**: For event impact studies + +--- + +### **Key Mindset Shifts** +1. **From**: "I need to learn all SQL functions" + **To**: "I need to master time-series patterns for bid/ask data" + +2. **From**: "Queries must return perfect results" + **To**: "Queries must be fast enough for real-time decisions" + +3. **From**: "Document after writing" + **To**: "Document patterns before implementation" + +--- + +**Next Step**: Pick **one** pattern from the 10 above. I'll provide: +1. Detailed breakdown +2. Common mistakes +3. Optimization techniques +4. Forensic analysis questions + +Which pattern would you like to deep-dive into first? (Recommend starting with **Time Bucketing** or **Rolling Volatility**) + +--- + Here’s the **20% of SQL skills that will deliver 80% of your forex data analysis needs**, structured as a focused roadmap: ---