Add docs/financial_docs/Database_Schema.md
This commit is contained in:
75
docs/financial_docs/Database_Schema.md
Normal file
75
docs/financial_docs/Database_Schema.md
Normal file
@@ -0,0 +1,75 @@
|
||||
### Objective
|
||||
|
||||
Create a unified database schema to store and analyze forex market data from Oanda, focusing on multiple currency pairs with the flexibility to support a wide range of analytical and machine learning workloads.
|
||||
|
||||
### Schema Design
|
||||
|
||||
The schema is designed to store time-series data for various forex instruments, capturing price movements and trading volumes over time, along with allowing for the storage of additional, flexible data points.
|
||||
|
||||
#### Proposed Schema for SQLite3
|
||||
|
||||
```sql
|
||||
CREATE TABLE forex_data (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
instrument TEXT NOT NULL,
|
||||
timestamp DATETIME NOT NULL,
|
||||
open REAL NOT NULL,
|
||||
high REAL NOT NULL,
|
||||
low REAL NOT NULL,
|
||||
close REAL NOT NULL,
|
||||
volume INTEGER,
|
||||
additional_info TEXT
|
||||
);
|
||||
```
|
||||
|
||||
#### Adaptation for TimescaleDB (PostgreSQL)
|
||||
|
||||
```sql
|
||||
CREATE TABLE forex_data (
|
||||
id SERIAL PRIMARY KEY,
|
||||
instrument VARCHAR(10) NOT NULL,
|
||||
timestamp TIMESTAMPTZ NOT NULL,
|
||||
open NUMERIC NOT NULL,
|
||||
high NUMERIC NOT NULL,
|
||||
low NUMERIC NOT NULL,
|
||||
close NUMERIC NOT NULL,
|
||||
volume NUMERIC,
|
||||
additional_info JSONB,
|
||||
CONSTRAINT unique_instrument_timestamp UNIQUE (instrument, timestamp)
|
||||
);
|
||||
```
|
||||
|
||||
### Key Components Explained
|
||||
|
||||
- **id**: A unique identifier for each row. Simplifies data retrieval and management, especially for ML applications where each data point might need to be uniquely identified.
|
||||
|
||||
- **instrument**: Specifies the forex pair (e.g., 'EUR_USD', 'GBP_JPY'), allowing data from multiple instruments to be stored in the same table.
|
||||
|
||||
- **timestamp**: Records the datetime for each data point. It's crucial for time series analysis. `TIMESTAMPTZ` in TimescaleDB ensures time zone awareness.
|
||||
|
||||
- **open, high, low, close**: Represent the opening, highest, lowest, and closing prices for the instrument within the specified time interval.
|
||||
|
||||
- **volume**: Represents the trading volume. It's optional, recognizing that volume data might not always be available or relevant.
|
||||
|
||||
- **additional_info**: A flexible JSONB (or TEXT in SQLite) column for storing any additional structured data related to the data point, such as bid/ask prices, computed indicators, or metadata.
|
||||
|
||||
- **unique_instrument_timestamp**: Ensures data integrity by preventing duplicate entries for the same instrument and timestamp.
|
||||
|
||||
### Transitioning from SQLite3 to TimescaleDB
|
||||
|
||||
This schema is designed with compatibility in mind. The transition from SQLite3 to TimescaleDB involves type adjustments and taking advantage of TimescaleDB's features for time-series data. Upon migration, you would:
|
||||
|
||||
1. Convert data types where necessary (e.g., `TEXT` to `VARCHAR`, `DATETIME` to `TIMESTAMPTZ`, `TEXT` containing JSON to `JSONB`).
|
||||
2. Apply TimescaleDB's time-series optimizations, such as creating a hypertable for efficient data storage and querying.
|
||||
|
||||
### Documentation and Usage Notes
|
||||
|
||||
- **Granularity**: Decide on the granularity (e.g., tick, minute, hourly, daily) based on your analytical needs. This affects the `timestamp` and potentially the `volume` and price precision.
|
||||
- **Time Zone Handling**: Be mindful of time zones, especially if analyzing global markets. `TIMESTAMPTZ` in TimescaleDB helps manage time zone complexities.
|
||||
- **Data Integrity**: The unique constraint on `instrument` and `timestamp` prevents data duplication, ensuring the database's reliability for analysis.
|
||||
- **Extensibility**: The `additional_info` JSONB column allows for the addition of new data points without schema modifications, offering extensibility for future analysis needs.
|
||||
- **Machine Learning and Analysis**: This schema supports direct use with Python's data analysis libraries (e.g., Pandas for data manipulation, Scikit-learn for ML modeling) by facilitating the extraction of features directly from stored data.
|
||||
|
||||
### Conclusion
|
||||
|
||||
This guide provides a blueprint for a database schema capable of supporting comprehensive forex data analysis and machine learning workloads, from initial development with SQLite3 to a scalable, production-ready setup with TimescaleDB. By focusing on flexibility, scalability, and compatibility, this schema ensures that your database can grow and evolve alongside your analytical capabilities, providing a solid foundation for extracting insights from forex market data.
|
||||
Reference in New Issue
Block a user