Update tech_docs/database/sql_getting_started.md

This commit is contained in:
2025-06-18 04:45:30 +00:00
parent 02b1de949e
commit 379c758728

View File

@@ -1,3 +1,116 @@
Heres the **20% of SQL skills that will deliver 80% of your forex data analysis needs**, structured as a focused roadmap:
---
### **SQL for Forex Data: The 20% Priority Roadmap**
#### **1. Core Skills (Weeks 1-2)**
| Skill | Why It Matters | Key Syntax |
|-------|---------------|------------|
| **Filtering Data** | Isolate specific currency pairs/timeframes | `SELECT * FROM ticks WHERE symbol='EUR/USD' AND timestamp > '2023-01-01'` |
| **Time Bucketing** | Convert ticks to candles (1min/5min/1H) | `DATE_TRUNC('hour', timestamp) AS hour` |
| **Basic Aggregates** | Calculate spreads, averages, highs/lows | `AVG(ask-bid) AS avg_spread`, `MAX(ask) AS high` |
| **Grouping** | Summarize by pair/time period | `GROUP BY symbol, DATE_TRUNC('day', timestamp)` |
#### **2. Essential Techniques (Weeks 3-4)**
| Skill | Forex Application | Example |
|-------|-------------------|---------|
| **Joins** | Combine tick data with economic calendars | `JOIN economic_events ON ticks.date = events.date` |
| **Rolling Windows** | Calculate moving averages/volatility | `AVG(price) OVER (ORDER BY timestamp ROWS 30 PRECEDING)` |
| **Correlations** | Compare pairs (EUR/USD vs. USD/JPY) | `CORR(eurusd_mid, usdjpy_mid)` |
| **Session Analysis** | Compare London/NY/Asia volatility | `WHERE EXTRACT(HOUR FROM timestamp) IN (7,13,21)` |
#### **3. Optimization (Week 5)**
| Skill | Impact | Implementation |
|-------|--------|----------------|
| **Indexing** | Speed up time/symbol queries | `CREATE INDEX idx_symbol_time ON ticks(symbol, timestamp)` |
| **CTEs** | Break complex queries into steps | `WITH filtered AS (...) SELECT * FROM filtered` |
| **Partitioning** | Faster queries on large datasets | `PARTITION BY RANGE (timestamp)` |
---
### **Prioritized Cheat Sheet**
#### **10 Queries Youll Use Daily**
1. **Current Spread**:
```sql
SELECT symbol, AVG(ask-bid) AS spread
FROM ticks
WHERE timestamp > NOW() - INTERVAL '1 hour'
GROUP BY symbol;
```
2. **5-Min Candles**:
```sql
SELECT
DATE_TRUNC('5 minutes', timestamp) AS time,
MIN(bid) AS low,
MAX(ask) AS high
FROM ticks
WHERE symbol = 'GBP/USD'
GROUP BY time;
```
3. **Rolling Volatility**:
```sql
SELECT
timestamp,
STDDEV(ask) OVER (ORDER BY timestamp ROWS 100 PRECEDING) AS vol
FROM ticks
WHERE symbol = 'EUR/USD';
```
4. **Session Volume**:
```sql
SELECT
CASE
WHEN EXTRACT(HOUR FROM timestamp) BETWEEN 7 AND 15 THEN 'London'
ELSE 'Other'
END AS session,
SUM(volume) AS total_volume
FROM ticks
GROUP BY session;
```
---
### **Study Plan**
1. **Week 1**: Master `SELECT`, `WHERE`, `GROUP BY`, `DATE_TRUNC`
→ *Goal: Generate hourly high/low/close for 1 pair*
2. **Week 2**: Learn `JOIN`, `AVG() OVER()`, `CORR()`
→ *Goal: Compare 2 pairs correlation last week vs. last month*
3. **Week 3**: Optimize with indexes + CTEs
→ *Goal: Make a 1M-row query run in <1 sec*
---
### **Tools to Use Now**
- **Data**: Free forex ticks from [Dukascopy](https://www.dukascopy.com/)
- **Database**: DuckDB (simple, fast, no setup)
- **Visualization**: Metabase (free) or Pythons Matplotlib
---
### **Avoid Until Later**
- Stored procedures
- Advanced indexing strategies
- Machine learning in SQL
- Recursive queries
---
**Key Principle**: Focus on **time-based analysis** (the core of forex data) first. Everything else builds on this.
Want the absolute bare minimum? Learn these 4 things:
1. `WHERE` + `DATE_TRUNC` (filter and bucket time)
2. `GROUP BY` (summarize data)
3. `AVG() OVER()` (rolling calculations)
4. `CORR()` (pair relationships)
Would you like a 5-question quiz to test your 20% knowledge?
---
Here's a structured **Technical Guide & Roadmap for Forex Tick Data Analysis with SQL**, designed as a progressive learning path with clear milestones and reference examples:
---