Files
the_information_nexus/docs/financial_docs/Database_Schema.md

4.2 KiB

Objective

Create a unified database schema to store and analyze forex market data from Oanda, focusing on multiple currency pairs with the flexibility to support a wide range of analytical and machine learning workloads.

Schema Design

The schema is designed to store time-series data for various forex instruments, capturing price movements and trading volumes over time, along with allowing for the storage of additional, flexible data points.

Proposed Schema for SQLite3

CREATE TABLE forex_data (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    instrument TEXT NOT NULL,
    timestamp DATETIME NOT NULL,
    open REAL NOT NULL,
    high REAL NOT NULL,
    low REAL NOT NULL,
    close REAL NOT NULL,
    volume INTEGER,
    additional_info TEXT
);

Adaptation for TimescaleDB (PostgreSQL)

CREATE TABLE forex_data (
    id SERIAL PRIMARY KEY,
    instrument VARCHAR(10) NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL,
    open NUMERIC NOT NULL,
    high NUMERIC NOT NULL,
    low NUMERIC NOT NULL,
    close NUMERIC NOT NULL,
    volume NUMERIC,
    additional_info JSONB,
    CONSTRAINT unique_instrument_timestamp UNIQUE (instrument, timestamp)
);

Key Components Explained

  • id: A unique identifier for each row. Simplifies data retrieval and management, especially for ML applications where each data point might need to be uniquely identified.

  • instrument: Specifies the forex pair (e.g., 'EUR_USD', 'GBP_JPY'), allowing data from multiple instruments to be stored in the same table.

  • timestamp: Records the datetime for each data point. It's crucial for time series analysis. TIMESTAMPTZ in TimescaleDB ensures time zone awareness.

  • open, high, low, close: Represent the opening, highest, lowest, and closing prices for the instrument within the specified time interval.

  • volume: Represents the trading volume. It's optional, recognizing that volume data might not always be available or relevant.

  • additional_info: A flexible JSONB (or TEXT in SQLite) column for storing any additional structured data related to the data point, such as bid/ask prices, computed indicators, or metadata.

  • unique_instrument_timestamp: Ensures data integrity by preventing duplicate entries for the same instrument and timestamp.

Transitioning from SQLite3 to TimescaleDB

This schema is designed with compatibility in mind. The transition from SQLite3 to TimescaleDB involves type adjustments and taking advantage of TimescaleDB's features for time-series data. Upon migration, you would:

  1. Convert data types where necessary (e.g., TEXT to VARCHAR, DATETIME to TIMESTAMPTZ, TEXT containing JSON to JSONB).
  2. Apply TimescaleDB's time-series optimizations, such as creating a hypertable for efficient data storage and querying.

Documentation and Usage Notes

  • Granularity: Decide on the granularity (e.g., tick, minute, hourly, daily) based on your analytical needs. This affects the timestamp and potentially the volume and price precision.
  • Time Zone Handling: Be mindful of time zones, especially if analyzing global markets. TIMESTAMPTZ in TimescaleDB helps manage time zone complexities.
  • Data Integrity: The unique constraint on instrument and timestamp prevents data duplication, ensuring the database's reliability for analysis.
  • Extensibility: The additional_info JSONB column allows for the addition of new data points without schema modifications, offering extensibility for future analysis needs.
  • Machine Learning and Analysis: This schema supports direct use with Python's data analysis libraries (e.g., Pandas for data manipulation, Scikit-learn for ML modeling) by facilitating the extraction of features directly from stored data.

Conclusion

This guide provides a blueprint for a database schema capable of supporting comprehensive forex data analysis and machine learning workloads, from initial development with SQLite3 to a scalable, production-ready setup with TimescaleDB. By focusing on flexibility, scalability, and compatibility, this schema ensures that your database can grow and evolve alongside your analytical capabilities, providing a solid foundation for extracting insights from forex market data.