Files
the_information_nexus/docs/tech_docs/Database_Schema.md
2024-03-06 12:54:05 -07:00

7.4 KiB

Objective

Create a unified database schema to store and analyze forex market data from Oanda, focusing on multiple currency pairs with the flexibility to support a wide range of analytical and machine learning workloads.

Schema Design

The schema is designed to store time-series data for various forex instruments, capturing price movements and trading volumes over time, along with allowing for the storage of additional, flexible data points.

Proposed Schema for SQLite3

CREATE TABLE forex_data (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    instrument TEXT NOT NULL,
    timestamp DATETIME NOT NULL,
    open REAL NOT NULL,
    high REAL NOT NULL,
    low REAL NOT NULL,
    close REAL NOT NULL,
    volume INTEGER,
    additional_info TEXT
);

Adaptation for TimescaleDB (PostgreSQL)

CREATE TABLE forex_data (
    id SERIAL PRIMARY KEY,
    instrument VARCHAR(10) NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL,
    open NUMERIC NOT NULL,
    high NUMERIC NOT NULL,
    low NUMERIC NOT NULL,
    close NUMERIC NOT NULL,
    volume NUMERIC,
    additional_info JSONB,
    CONSTRAINT unique_instrument_timestamp UNIQUE (instrument, timestamp)
);

Key Components Explained

  • id: A unique identifier for each row. Simplifies data retrieval and management, especially for ML applications where each data point might need to be uniquely identified.

  • instrument: Specifies the forex pair (e.g., 'EUR_USD', 'GBP_JPY'), allowing data from multiple instruments to be stored in the same table.

  • timestamp: Records the datetime for each data point. It's crucial for time series analysis. TIMESTAMPTZ in TimescaleDB ensures time zone awareness.

  • open, high, low, close: Represent the opening, highest, lowest, and closing prices for the instrument within the specified time interval.

  • volume: Represents the trading volume. It's optional, recognizing that volume data might not always be available or relevant.

  • additional_info: A flexible JSONB (or TEXT in SQLite) column for storing any additional structured data related to the data point, such as bid/ask prices, computed indicators, or metadata.

  • unique_instrument_timestamp: Ensures data integrity by preventing duplicate entries for the same instrument and timestamp.

Transitioning from SQLite3 to TimescaleDB

This schema is designed with compatibility in mind. The transition from SQLite3 to TimescaleDB involves type adjustments and taking advantage of TimescaleDB's features for time-series data. Upon migration, you would:

  1. Convert data types where necessary (e.g., TEXT to VARCHAR, DATETIME to TIMESTAMPTZ, TEXT containing JSON to JSONB).
  2. Apply TimescaleDB's time-series optimizations, such as creating a hypertable for efficient data storage and querying.

Documentation and Usage Notes

  • Granularity: Decide on the granularity (e.g., tick, minute, hourly, daily) based on your analytical needs. This affects the timestamp and potentially the volume and price precision.
  • Time Zone Handling: Be mindful of time zones, especially if analyzing global markets. TIMESTAMPTZ in TimescaleDB helps manage time zone complexities.
  • Data Integrity: The unique constraint on instrument and timestamp prevents data duplication, ensuring the database's reliability for analysis.
  • Extensibility: The additional_info JSONB column allows for the addition of new data points without schema modifications, offering extensibility for future analysis needs.
  • Machine Learning and Analysis: This schema supports direct use with Python's data analysis libraries (e.g., Pandas for data manipulation, Scikit-learn for ML modeling) by facilitating the extraction of features directly from stored data.

Conclusion

This guide provides a blueprint for a database schema capable of supporting comprehensive forex data analysis and machine learning workloads, from initial development with SQLite3 to a scalable, production-ready setup with TimescaleDB. By focusing on flexibility, scalability, and compatibility, this schema ensures that your database can grow and evolve alongside your analytical capabilities, providing a solid foundation for extracting insights from forex market data.


Setting up databases on Linux and macOS involves using the command line interface (CLI) for both SQLite3 and PostgreSQL. Here's a direct guide to get you started with creating, attaching to, verifying, and exiting databases in both environments.

SQLite3 Setup

SQLite3 is often pre-installed on macOS and Linux. If it's not, you can install it via the package manager.

Installation (if needed)

  • macOS: Use Homebrew to install SQLite3.
    brew install sqlite
    
  • Linux (Debian-based systems):
    sudo apt-get update
    sudo apt-get install sqlite3
    

Basic Commands

  • Create or Open Database:

    sqlite3 forex_data.db
    

    This command creates the forex_data.db file if it doesn't exist or opens it if it does.

  • Attach to Another Database (if you're already in an SQLite session and want to work with another database simultaneously):

    ATTACH DATABASE 'path/to/other_database.db' AS other_db;
    
  • Verification:

    • To verify the tables in your database:
      .tables
      
    • To check the schema of a specific table:
      .schema forex_data
      
  • Exit:

    .quit
    

PostgreSQL (Includes TimescaleDB) Setup

PostgreSQL needs to be installed, and TimescaleDB is an extension that you add to PostgreSQL. TimescaleDB harnesses the power of PostgreSQL for time-series data.

Installation

  • PostgreSQL:

    • macOS: Using Homebrew:
      brew install postgresql
      
    • Linux (Debian-based systems):
      sudo apt-get update
      sudo apt-get install postgresql postgresql-contrib
      
  • TimescaleDB: After installing PostgreSQL, install TimescaleDB. Check TimescaleDB's documentation for the most current instructions, as the installation process may vary depending on your PostgreSQL version.

Basic Commands

  • Start PostgreSQL Service:

    • macOS:
      brew services start postgresql
      
    • Linux:
      sudo service postgresql start
      
  • Create Database:

    createdb forex_data
    
  • Connect to Database:

    psql forex_data
    

    This command connects you to the forex_data database using the psql command-line interface.

  • Attach to Another Database: In PostgreSQL, you connect to databases one at a time. To switch databases:

    \c other_database_name
    
  • Verification:

    • List all tables:
      \dt
      
    • Check the schema of a specific table:
      \d+ forex_data
      
  • Exit:

    \q
    

TimescaleDB Setup

After installing TimescaleDB, you can create a hypertable from your existing table to leverage TimescaleDB's features:

SELECT create_hypertable('forex_data', 'timestamp');

Run this command in the psql interface after connecting to your database.

This guide provides a streamlined path to setting up SQLite3 and PostgreSQL (with TimescaleDB) databases on Linux and macOS, along with basic commands for database management and schema verification. These steps will help you create a robust environment for forex data analysis and development.