Files
the_information_nexus/tech_docs/database/sql_roadmap.md

8.6 KiB

SQL for Forex Trading - Ultimate Cheat Sheet

1. Essential Command Structure

[ACTION] [TARGET] [DETAILS] [CONDITIONS] [MODIFIERS]

Example:

SELECT bid, ask           -- ACTION: what to get
FROM ticks                -- TARGET: where from
WHERE symbol = 'EUR/USD'  -- CONDITIONS: filters
ORDER BY timestamp DESC   -- MODIFIERS: sorting
LIMIT 100;                -- MODIFIERS: quantity

2. Core Commands

Table Operations

Command Example Purpose
CREATE TABLE CREATE TABLE ticks(timestamp TIMESTAMP, bid FLOAT) Create new table
ALTER TABLE ALTER TABLE ticks ADD COLUMN volume FLOAT Modify structure
DROP TABLE DROP TABLE ticks Delete table

Data Import/Export

-- Import CSV
COPY ticks FROM '/data/ticks.csv' (FORMAT CSV);

-- Export results
COPY (SELECT * FROM ticks) TO '/output.csv' (FORMAT CSV);

Basic Queries

-- All columns, limited rows
SELECT * FROM ticks LIMIT 10;

-- Specific columns
SELECT timestamp, bid, ask FROM ticks;

-- Filtered data
SELECT * FROM ticks WHERE bid > 1.1000 AND volume > 1000;

3. Time-Series Patterns

Candlestick Generation

SELECT 
  DATE_TRUNC('1 minute', timestamp) AS minute,
  FIRST(bid) AS open,
  MAX(bid) AS high,
  MIN(bid) AS low,
  LAST(bid) AS close,
  SUM(volume) AS volume
FROM ticks
GROUP BY minute;

Rolling Calculations

-- 30-minute moving average
SELECT 
  timestamp,
  AVG(bid) OVER (ORDER BY timestamp ROWS 29 PRECEDING) AS ma_30
FROM ticks;

-- Rolling spread
SELECT 
  timestamp,
  AVG(ask - bid) OVER (ORDER BY timestamp RANGE '5 minutes') AS avg_spread
FROM ticks;

4. Key Metrics

Spread Analysis

-- Basic spread
SELECT timestamp, ask - bid AS spread FROM ticks;

-- Session averages
SELECT 
  EXTRACT(HOUR FROM timestamp) AS hour,
  AVG(ask - bid) AS avg_spread
FROM ticks
GROUP BY hour;

Order Book Imbalance

SELECT 
  timestamp,
  (bid_size - ask_size) / (bid_size + ask_size) AS imbalance
FROM ticks;

5. Advanced Patterns

Correlation Analysis

WITH hourly AS (
  SELECT 
    DATE_TRUNC('hour', timestamp) AS hour,
    AVG(CASE WHEN symbol='EUR/USD' THEN bid END) AS eurusd,
    AVG(CASE WHEN symbol='USD/JPY' THEN bid END) AS usdjpy
  FROM ticks
  GROUP BY hour
)
SELECT CORR(eurusd, usdjpy) FROM hourly;

Event Detection

-- Large price jumps
SELECT *
FROM ticks
WHERE ABS(bid - LAG(bid) OVER (ORDER BY timestamp)) > 0.0010;

6. Optimization

Indexing

-- Basic index
CREATE INDEX idx_symbol_time ON ticks(symbol, timestamp);

-- For time-series
SELECT create_hypertable('ticks', 'timestamp');  -- TimescaleDB

Query Performance

-- Explain plan
EXPLAIN ANALYZE SELECT * FROM ticks WHERE symbol = 'EUR/USD';

-- Common optimizations:
-- 1. Use WHERE before GROUP BY
-- 2. Limit selected columns (not SELECT *)
-- 3. Use EXISTS instead of IN for large datasets

7. Quick Reference

Data Types

Type Example Used For
TIMESTAMP 2024-01-01 12:00:00 Time data
FLOAT 1.12345 Prices/values
VARCHAR 'EUR/USD' Text/symbols
BOOLEAN TRUE Flags

Aggregate Functions

Function Example Purpose
AVG() AVG(bid) Average
MAX() MAX(ask) Highest value
MIN() MIN(bid) Lowest value
SUM() SUM(volume) Total volume
COUNT() COUNT(*) Row count

8. Common Errors & Fixes

Error Solution
Missing comma in column list Check commas between columns in SELECT/CREATE
"GROUP BY" needed Add GROUP BY when using aggregates
NULL comparison issues Use IS NULL not = NULL
Slow queries Add indexes on filtered/sorted columns

How to Use This Cheat Sheet

  1. Daily Practice: Pick 2-3 patterns to implement each day
  2. Wall Reference: Print and keep near your workstation
  3. Troubleshooting: Use the errors section when queries fail

Want me to:

  1. Provide a PDF version of this cheat sheet?
  2. Add more forex-specific examples?
  3. Or focus on a particular section in more detail?

Here's a streamlined 8-week roadmap focused purely on practical SQL skills for forex bid/ask analysis, structured for immediate application in cron jobs:

Week 1-2: Core Foundations for Tick Data

Goal: Process raw ticks into usable formats
Key Skills:

  1. Basic filtering

    -- Isolate specific currency pairs/time windows
    SELECT * FROM ticks 
    WHERE symbol = 'EUR/USD' 
      AND timestamp BETWEEN '2024-01-01 00:00' AND '2024-01-01 23:59'
    
  2. Candlestick generation

    -- 1-minute OHLC candles
    SELECT 
      symbol,
      DATE_TRUNC('minute', timestamp) AS minute,
      FIRST(bid) AS open,
      MAX(bid) AS high,
      MIN(bid) AS low,
      LAST(bid) AS close
    FROM ticks
    GROUP BY symbol, minute
    
  3. Spread metrics

    -- Average spread by hour
    SELECT 
      symbol,
      EXTRACT(HOUR FROM timestamp) AS hour,
      AVG(ask - bid) AS avg_spread
    FROM ticks
    GROUP BY symbol, hour
    

Week 3-4: Session Analysis & Basic Signals

Goal: Identify trading opportunities
Key Skills:

  1. Session volatility

    -- London vs. NY session comparison
    SELECT 
      CASE 
        WHEN EXTRACT(HOUR FROM timestamp) BETWEEN 7 AND 15 THEN 'London'
        WHEN EXTRACT(HOUR FROM timestamp) BETWEEN 13 AND 21 THEN 'NY'
        ELSE 'Other' 
      END AS session,
      STDDEV((bid+ask)/2) AS volatility
    FROM ticks
    GROUP BY session
    
  2. Rolling spreads

    -- 30-minute moving spread
    SELECT 
      timestamp,
      AVG(ask - bid) OVER (
        ORDER BY timestamp 
        ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
      ) AS rolling_spread
    FROM ticks
    WHERE symbol = 'GBP/USD'
    
  3. Basic alerts

    -- Spread widening alert
    SELECT symbol, timestamp, (ask - bid) AS spread
    FROM ticks
    WHERE (ask - bid) > 3 * (
      SELECT AVG(ask - bid) 
      FROM ticks 
      WHERE timestamp > NOW() - INTERVAL '1 day'
    )
    

Week 5-6: Advanced Pattern Detection

Goal: Build automated signal detectors
Key Skills:

  1. Microprice calculation

    -- Weighted mid-price
    SELECT 
      timestamp,
      (bid*ask_size + ask*bid_size)/(bid_size + ask_size) AS microprice
    FROM ticks
    
  2. Order flow imbalance

    -- Bid/ask size ratio
    SELECT 
      timestamp,
      (bid_size - ask_size)/(bid_size + ask_size) AS imbalance
    FROM ticks
    WHERE ABS((bid_size - ask_size)/(bid_size + ask_size)) > 0.7
    
  3. Consecutive moves

    -- 5+ consecutive bid increases
    WITH changes AS (
      SELECT *,
        CASE WHEN bid > LAG(bid) OVER (ORDER BY timestamp) THEN 1 ELSE 0 END AS is_up
      FROM ticks
    )
    SELECT timestamp, bid
    FROM changes
    WHERE is_up = 1
    ORDER BY timestamp
    LIMIT 5
    

Week 7-8: Optimization & Productionization

Goal: Make scripts robust and efficient
Key Skills:

  1. Indexing for time-series

    CREATE INDEX idx_symbol_time ON ticks(symbol, timestamp);
    
  2. CTEs for complex logic

    WITH 
    london_ticks AS (
      SELECT * FROM ticks 
      WHERE EXTRACT(HOUR FROM timestamp) BETWEEN 7 AND 15
    ),
    spreads AS (
      SELECT symbol, AVG(ask - bid) AS avg_spread
      FROM london_ticks
      GROUP BY symbol
    )
    SELECT * FROM spreads WHERE avg_spread > 0.0005;
    
  3. Partitioning large tables

    CREATE TABLE ticks_partitioned (
      -- schema
    ) PARTITION BY RANGE (timestamp);
    

Daily Practice Structure

  1. Morning (5 min): Run basic monitoring query

    -- Current spread status
    SELECT symbol, AVG(ask - bid) AS spread 
    FROM ticks 
    WHERE timestamp > NOW() - INTERVAL '15 minutes'
    GROUP BY symbol;
    
  2. Evening (15 min): Build one new analysis query

    • Monday: Session comparisons
    • Tuesday: Rolling metrics
    • Wednesday: Alert conditions
    • Thursday: Optimization tweaks
    • Friday: Backtest old queries

Key Mindset Shifts

  1. Think in ticks, not hours: Your queries should process milliseconds efficiently
  2. Pre-compute everything: Generate candlesticks/aggregates in SQL, not Python
  3. Log everything: Every cron job should write results to a logging table

Want the condensed 1-page cheat sheet version of this roadmap? Or should we focus next on building your first complete cron-ready SQL script?