7.4 KiB
Objective
Create a unified database schema to store and analyze forex market data from Oanda, focusing on multiple currency pairs with the flexibility to support a wide range of analytical and machine learning workloads.
Schema Design
The schema is designed to store time-series data for various forex instruments, capturing price movements and trading volumes over time, along with allowing for the storage of additional, flexible data points.
Proposed Schema for SQLite3
CREATE TABLE forex_data (
id INTEGER PRIMARY KEY AUTOINCREMENT,
instrument TEXT NOT NULL,
timestamp DATETIME NOT NULL,
open REAL NOT NULL,
high REAL NOT NULL,
low REAL NOT NULL,
close REAL NOT NULL,
volume INTEGER,
additional_info TEXT
);
Adaptation for TimescaleDB (PostgreSQL)
CREATE TABLE forex_data (
id SERIAL PRIMARY KEY,
instrument VARCHAR(10) NOT NULL,
timestamp TIMESTAMPTZ NOT NULL,
open NUMERIC NOT NULL,
high NUMERIC NOT NULL,
low NUMERIC NOT NULL,
close NUMERIC NOT NULL,
volume NUMERIC,
additional_info JSONB,
CONSTRAINT unique_instrument_timestamp UNIQUE (instrument, timestamp)
);
Key Components Explained
-
id: A unique identifier for each row. Simplifies data retrieval and management, especially for ML applications where each data point might need to be uniquely identified.
-
instrument: Specifies the forex pair (e.g., 'EUR_USD', 'GBP_JPY'), allowing data from multiple instruments to be stored in the same table.
-
timestamp: Records the datetime for each data point. It's crucial for time series analysis.
TIMESTAMPTZin TimescaleDB ensures time zone awareness. -
open, high, low, close: Represent the opening, highest, lowest, and closing prices for the instrument within the specified time interval.
-
volume: Represents the trading volume. It's optional, recognizing that volume data might not always be available or relevant.
-
additional_info: A flexible JSONB (or TEXT in SQLite) column for storing any additional structured data related to the data point, such as bid/ask prices, computed indicators, or metadata.
-
unique_instrument_timestamp: Ensures data integrity by preventing duplicate entries for the same instrument and timestamp.
Transitioning from SQLite3 to TimescaleDB
This schema is designed with compatibility in mind. The transition from SQLite3 to TimescaleDB involves type adjustments and taking advantage of TimescaleDB's features for time-series data. Upon migration, you would:
- Convert data types where necessary (e.g.,
TEXTtoVARCHAR,DATETIMEtoTIMESTAMPTZ,TEXTcontaining JSON toJSONB). - Apply TimescaleDB's time-series optimizations, such as creating a hypertable for efficient data storage and querying.
Documentation and Usage Notes
- Granularity: Decide on the granularity (e.g., tick, minute, hourly, daily) based on your analytical needs. This affects the
timestampand potentially thevolumeand price precision. - Time Zone Handling: Be mindful of time zones, especially if analyzing global markets.
TIMESTAMPTZin TimescaleDB helps manage time zone complexities. - Data Integrity: The unique constraint on
instrumentandtimestampprevents data duplication, ensuring the database's reliability for analysis. - Extensibility: The
additional_infoJSONB column allows for the addition of new data points without schema modifications, offering extensibility for future analysis needs. - Machine Learning and Analysis: This schema supports direct use with Python's data analysis libraries (e.g., Pandas for data manipulation, Scikit-learn for ML modeling) by facilitating the extraction of features directly from stored data.
Conclusion
This guide provides a blueprint for a database schema capable of supporting comprehensive forex data analysis and machine learning workloads, from initial development with SQLite3 to a scalable, production-ready setup with TimescaleDB. By focusing on flexibility, scalability, and compatibility, this schema ensures that your database can grow and evolve alongside your analytical capabilities, providing a solid foundation for extracting insights from forex market data.
Setting up databases on Linux and macOS involves using the command line interface (CLI) for both SQLite3 and PostgreSQL. Here's a direct guide to get you started with creating, attaching to, verifying, and exiting databases in both environments.
SQLite3 Setup
SQLite3 is often pre-installed on macOS and Linux. If it's not, you can install it via the package manager.
Installation (if needed)
- macOS: Use Homebrew to install SQLite3.
brew install sqlite - Linux (Debian-based systems):
sudo apt-get update sudo apt-get install sqlite3
Basic Commands
-
Create or Open Database:
sqlite3 forex_data.dbThis command creates the
forex_data.dbfile if it doesn't exist or opens it if it does. -
Attach to Another Database (if you're already in an SQLite session and want to work with another database simultaneously):
ATTACH DATABASE 'path/to/other_database.db' AS other_db; -
Verification:
- To verify the tables in your database:
.tables - To check the schema of a specific table:
.schema forex_data
- To verify the tables in your database:
-
Exit:
.quit
PostgreSQL (Includes TimescaleDB) Setup
PostgreSQL needs to be installed, and TimescaleDB is an extension that you add to PostgreSQL. TimescaleDB harnesses the power of PostgreSQL for time-series data.
Installation
-
PostgreSQL:
- macOS: Using Homebrew:
brew install postgresql - Linux (Debian-based systems):
sudo apt-get update sudo apt-get install postgresql postgresql-contrib
- macOS: Using Homebrew:
-
TimescaleDB: After installing PostgreSQL, install TimescaleDB. Check TimescaleDB's documentation for the most current instructions, as the installation process may vary depending on your PostgreSQL version.
Basic Commands
-
Start PostgreSQL Service:
- macOS:
brew services start postgresql - Linux:
sudo service postgresql start
- macOS:
-
Create Database:
createdb forex_data -
Connect to Database:
psql forex_dataThis command connects you to the
forex_datadatabase using thepsqlcommand-line interface. -
Attach to Another Database: In PostgreSQL, you connect to databases one at a time. To switch databases:
\c other_database_name -
Verification:
- List all tables:
\dt - Check the schema of a specific table:
\d+ forex_data
- List all tables:
-
Exit:
\q
TimescaleDB Setup
After installing TimescaleDB, you can create a hypertable from your existing table to leverage TimescaleDB's features:
SELECT create_hypertable('forex_data', 'timestamp');
Run this command in the psql interface after connecting to your database.
This guide provides a streamlined path to setting up SQLite3 and PostgreSQL (with TimescaleDB) databases on Linux and macOS, along with basic commands for database management and schema verification. These steps will help you create a robust environment for forex data analysis and development.