Update tech_docs/database/sql_getting_started.md

This commit is contained in:
2025-06-18 04:54:26 +00:00
parent dd9acc1a54
commit f6d0035f53

View File

@@ -1,3 +1,115 @@
# **SQL for Forex Data Analysis: The 20% That Delivers 80% Results**
## **Focused Learning Roadmap**
*Master these core skills to handle most forex data analysis tasks*
### **Phase 1: Core Skills (Week 1-2)**
**What to Learn** | **Why It Matters** | **Key Syntax Examples**
-----------------|-------------------|----------------------
**Filtering Data** | Isolate specific currency pairs/timeframes | `SELECT * FROM ticks WHERE symbol='EUR/USD' AND timestamp > '2023-01-01'`
**Time Bucketing** | Convert raw ticks into candlesticks (1min/5min/1H) | `DATE_TRUNC('hour', timestamp) AS hour`
**Basic Aggregates** | Calculate spreads, highs/lows, averages | `AVG(ask-bid) AS avg_spread`, `MAX(ask) AS high`
**Grouping** | Summarize data by pair/time period | `GROUP BY symbol, DATE_TRUNC('day', timestamp)`
---
### **Phase 2: Essential Techniques (Week 3-4)**
**Skill** | **Forex Application** | **Example**
---------|---------------------|-----------
**Joins** | Combine tick data with economic calendars | `JOIN economic_events ON ticks.date = events.date`
**Rolling Windows** | Calculate moving averages & volatility | `AVG(price) OVER (ORDER BY timestamp ROWS 30 PRECEDING)`
**Correlations** | Compare currency pairs (e.g., EUR/USD vs. USD/JPY) | `CORR(eurusd_mid, usdjpy_mid)`
**Session Analysis** | Compare volatility across trading sessions | `WHERE EXTRACT(HOUR FROM timestamp) IN (7,13,21)` *(London/NY/Asia hours)*
---
### **Phase 3: Optimization (Week 5)**
**Skill** | **Impact** | **Implementation**
---------|----------|-----------------
**Indexing** | Speed up time/symbol queries | `CREATE INDEX idx_symbol_time ON ticks(symbol, timestamp)`
**CTEs** | Break complex queries into steps | `WITH filtered AS (...) SELECT * FROM filtered`
**Partitioning** | Faster queries on large datasets | `PARTITION BY RANGE (timestamp)`
---
## **10 Essential Forex Queries You'll Use Daily**
1. **Current Spread Analysis**
```sql
SELECT symbol, AVG(ask-bid) AS spread
FROM ticks
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY symbol;
```
2. **5-Minute Candlesticks**
```sql
SELECT
DATE_TRUNC('5 minutes', timestamp) AS time,
MIN(bid) AS low,
MAX(ask) AS high
FROM ticks
WHERE symbol = 'GBP/USD'
GROUP BY time;
```
3. **Rolling Volatility**
```sql
SELECT
timestamp,
STDDEV(ask) OVER (ORDER BY timestamp ROWS 100 PRECEDING) AS vol
FROM ticks
WHERE symbol = 'EUR/USD';
```
4. **Session Volume Comparison**
```sql
SELECT
CASE
WHEN EXTRACT(HOUR FROM timestamp) BETWEEN 7 AND 15 THEN 'London'
ELSE 'Other'
END AS session,
SUM(volume) AS total_volume
FROM ticks
GROUP BY session;
```
---
## **Study Plan**
- **Week 1**: Master `SELECT`, `WHERE`, `GROUP BY`, `DATE_TRUNC`
→ *Goal: Generate hourly OHLC data for one currency pair*
- **Week 2**: Learn `JOIN`, `AVG() OVER()`, `CORR()`
→ *Goal: Compare two pairs' correlation over different timeframes*
- **Week 3**: Optimize with indexes + CTEs
→ *Goal: Run a 1M-row query in <1 second*
---
## **Tools to Start With**
- **Data**: Free forex ticks from [Dukascopy](https://www.dukascopy.com/)
- **Database**: DuckDB (lightweight, no setup required)
- **Visualization**: Metabase (free) or Python with Matplotlib
---
## **What to Avoid (For Now)**
- Stored procedures
- Advanced indexing strategies
- Machine learning in SQL
- Recursive queries
---
### **Bare Minimum Survival Kit**
1. `WHERE` + `DATE_TRUNC` *(filter and bucket time data)*
2. `GROUP BY` *(summarize data efficiently)*
3. `AVG() OVER()` *(rolling calculations)*
4. `CORR()` *(measure pair relationships)*
**Next Step**: Want a 5-question quiz to test your core knowledge? Or shall we dive deeper into one technique?
---
Understood! You're seeking **maximum SQL mastery with minimum time investment** by focusing on the critical 20% that delivers 80% of results in forex data analysis. Here's your **laser-focused SQL SME blueprint**:
---