diff --git a/tech_docs/database/sql_getting_started.md b/tech_docs/database/sql_getting_started.md index 5ac2fad..9013f24 100644 --- a/tech_docs/database/sql_getting_started.md +++ b/tech_docs/database/sql_getting_started.md @@ -1,3 +1,115 @@ +# **SQL for Forex Data Analysis: The 20% That Delivers 80% Results** + +## **Focused Learning Roadmap** +*Master these core skills to handle most forex data analysis tasks* + +### **Phase 1: Core Skills (Week 1-2)** +**What to Learn** | **Why It Matters** | **Key Syntax Examples** +-----------------|-------------------|---------------------- +**Filtering Data** | Isolate specific currency pairs/timeframes | `SELECT * FROM ticks WHERE symbol='EUR/USD' AND timestamp > '2023-01-01'` +**Time Bucketing** | Convert raw ticks into candlesticks (1min/5min/1H) | `DATE_TRUNC('hour', timestamp) AS hour` +**Basic Aggregates** | Calculate spreads, highs/lows, averages | `AVG(ask-bid) AS avg_spread`, `MAX(ask) AS high` +**Grouping** | Summarize data by pair/time period | `GROUP BY symbol, DATE_TRUNC('day', timestamp)` + +--- + +### **Phase 2: Essential Techniques (Week 3-4)** +**Skill** | **Forex Application** | **Example** +---------|---------------------|----------- +**Joins** | Combine tick data with economic calendars | `JOIN economic_events ON ticks.date = events.date` +**Rolling Windows** | Calculate moving averages & volatility | `AVG(price) OVER (ORDER BY timestamp ROWS 30 PRECEDING)` +**Correlations** | Compare currency pairs (e.g., EUR/USD vs. USD/JPY) | `CORR(eurusd_mid, usdjpy_mid)` +**Session Analysis** | Compare volatility across trading sessions | `WHERE EXTRACT(HOUR FROM timestamp) IN (7,13,21)` *(London/NY/Asia hours)* + +--- + +### **Phase 3: Optimization (Week 5)** +**Skill** | **Impact** | **Implementation** +---------|----------|----------------- +**Indexing** | Speed up time/symbol queries | `CREATE INDEX idx_symbol_time ON ticks(symbol, timestamp)` +**CTEs** | Break complex queries into steps | `WITH filtered AS (...) SELECT * FROM filtered` +**Partitioning** | Faster queries on large datasets | `PARTITION BY RANGE (timestamp)` + +--- + +## **10 Essential Forex Queries You'll Use Daily** +1. **Current Spread Analysis** + ```sql + SELECT symbol, AVG(ask-bid) AS spread + FROM ticks + WHERE timestamp > NOW() - INTERVAL '1 hour' + GROUP BY symbol; + ``` + +2. **5-Minute Candlesticks** + ```sql + SELECT + DATE_TRUNC('5 minutes', timestamp) AS time, + MIN(bid) AS low, + MAX(ask) AS high + FROM ticks + WHERE symbol = 'GBP/USD' + GROUP BY time; + ``` + +3. **Rolling Volatility** + ```sql + SELECT + timestamp, + STDDEV(ask) OVER (ORDER BY timestamp ROWS 100 PRECEDING) AS vol + FROM ticks + WHERE symbol = 'EUR/USD'; + ``` + +4. **Session Volume Comparison** + ```sql + SELECT + CASE + WHEN EXTRACT(HOUR FROM timestamp) BETWEEN 7 AND 15 THEN 'London' + ELSE 'Other' + END AS session, + SUM(volume) AS total_volume + FROM ticks + GROUP BY session; + ``` + +--- + +## **Study Plan** +- **Week 1**: Master `SELECT`, `WHERE`, `GROUP BY`, `DATE_TRUNC` + → *Goal: Generate hourly OHLC data for one currency pair* +- **Week 2**: Learn `JOIN`, `AVG() OVER()`, `CORR()` + → *Goal: Compare two pairs' correlation over different timeframes* +- **Week 3**: Optimize with indexes + CTEs + → *Goal: Run a 1M-row query in <1 second* + +--- + +## **Tools to Start With** +- **Data**: Free forex ticks from [Dukascopy](https://www.dukascopy.com/) +- **Database**: DuckDB (lightweight, no setup required) +- **Visualization**: Metabase (free) or Python with Matplotlib + +--- + +## **What to Avoid (For Now)** +- Stored procedures +- Advanced indexing strategies +- Machine learning in SQL +- Recursive queries + +--- + +### **Bare Minimum Survival Kit** +1. `WHERE` + `DATE_TRUNC` *(filter and bucket time data)* +2. `GROUP BY` *(summarize data efficiently)* +3. `AVG() OVER()` *(rolling calculations)* +4. `CORR()` *(measure pair relationships)* + +**Next Step**: Want a 5-question quiz to test your core knowledge? Or shall we dive deeper into one technique? + +--- + Understood! You're seeking **maximum SQL mastery with minimum time investment** by focusing on the critical 20% that delivers 80% of results in forex data analysis. Here's your **laser-focused SQL SME blueprint**: ---