From f9fca73d9d4e1403cc76c72c07e3263c460a5f85 Mon Sep 17 00:00:00 2001 From: medusa Date: Tue, 20 Feb 2024 20:41:45 +0000 Subject: [PATCH] Add docs/financial_docs/Database_Schema.md --- docs/financial_docs/Database_Schema.md | 75 ++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 docs/financial_docs/Database_Schema.md diff --git a/docs/financial_docs/Database_Schema.md b/docs/financial_docs/Database_Schema.md new file mode 100644 index 0000000..b732e27 --- /dev/null +++ b/docs/financial_docs/Database_Schema.md @@ -0,0 +1,75 @@ +### Objective + +Create a unified database schema to store and analyze forex market data from Oanda, focusing on multiple currency pairs with the flexibility to support a wide range of analytical and machine learning workloads. + +### Schema Design + +The schema is designed to store time-series data for various forex instruments, capturing price movements and trading volumes over time, along with allowing for the storage of additional, flexible data points. + +#### Proposed Schema for SQLite3 + +```sql +CREATE TABLE forex_data ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + instrument TEXT NOT NULL, + timestamp DATETIME NOT NULL, + open REAL NOT NULL, + high REAL NOT NULL, + low REAL NOT NULL, + close REAL NOT NULL, + volume INTEGER, + additional_info TEXT +); +``` + +#### Adaptation for TimescaleDB (PostgreSQL) + +```sql +CREATE TABLE forex_data ( + id SERIAL PRIMARY KEY, + instrument VARCHAR(10) NOT NULL, + timestamp TIMESTAMPTZ NOT NULL, + open NUMERIC NOT NULL, + high NUMERIC NOT NULL, + low NUMERIC NOT NULL, + close NUMERIC NOT NULL, + volume NUMERIC, + additional_info JSONB, + CONSTRAINT unique_instrument_timestamp UNIQUE (instrument, timestamp) +); +``` + +### Key Components Explained + +- **id**: A unique identifier for each row. Simplifies data retrieval and management, especially for ML applications where each data point might need to be uniquely identified. + +- **instrument**: Specifies the forex pair (e.g., 'EUR_USD', 'GBP_JPY'), allowing data from multiple instruments to be stored in the same table. + +- **timestamp**: Records the datetime for each data point. It's crucial for time series analysis. `TIMESTAMPTZ` in TimescaleDB ensures time zone awareness. + +- **open, high, low, close**: Represent the opening, highest, lowest, and closing prices for the instrument within the specified time interval. + +- **volume**: Represents the trading volume. It's optional, recognizing that volume data might not always be available or relevant. + +- **additional_info**: A flexible JSONB (or TEXT in SQLite) column for storing any additional structured data related to the data point, such as bid/ask prices, computed indicators, or metadata. + +- **unique_instrument_timestamp**: Ensures data integrity by preventing duplicate entries for the same instrument and timestamp. + +### Transitioning from SQLite3 to TimescaleDB + +This schema is designed with compatibility in mind. The transition from SQLite3 to TimescaleDB involves type adjustments and taking advantage of TimescaleDB's features for time-series data. Upon migration, you would: + +1. Convert data types where necessary (e.g., `TEXT` to `VARCHAR`, `DATETIME` to `TIMESTAMPTZ`, `TEXT` containing JSON to `JSONB`). +2. Apply TimescaleDB's time-series optimizations, such as creating a hypertable for efficient data storage and querying. + +### Documentation and Usage Notes + +- **Granularity**: Decide on the granularity (e.g., tick, minute, hourly, daily) based on your analytical needs. This affects the `timestamp` and potentially the `volume` and price precision. +- **Time Zone Handling**: Be mindful of time zones, especially if analyzing global markets. `TIMESTAMPTZ` in TimescaleDB helps manage time zone complexities. +- **Data Integrity**: The unique constraint on `instrument` and `timestamp` prevents data duplication, ensuring the database's reliability for analysis. +- **Extensibility**: The `additional_info` JSONB column allows for the addition of new data points without schema modifications, offering extensibility for future analysis needs. +- **Machine Learning and Analysis**: This schema supports direct use with Python's data analysis libraries (e.g., Pandas for data manipulation, Scikit-learn for ML modeling) by facilitating the extraction of features directly from stored data. + +### Conclusion + +This guide provides a blueprint for a database schema capable of supporting comprehensive forex data analysis and machine learning workloads, from initial development with SQLite3 to a scalable, production-ready setup with TimescaleDB. By focusing on flexibility, scalability, and compatibility, this schema ensures that your database can grow and evolve alongside your analytical capabilities, providing a solid foundation for extracting insights from forex market data. \ No newline at end of file