updates

2024-06-01 17:57:05 -06:00
parent d8aecfe61b
commit d8f0148c70
1 changed files with 127 additions and 0 deletions
--- a/financial_docs/ml_trading.md
+++ b/financial_docs/ml_trading.md
@@ -0,0 +1,127 @@
+### Technical Guide for Forex Time Series Analysis Using AI/ML Models
+
+#### Objective
+This guide provides a comprehensive overview of the methodologies and machine learning models used in analyzing forex time series data, focusing on EUR/USD and other major and minor pairs. The goal is to understand the underlying technical principles, implement feature engineering, perform correlation analysis, identify trends, train AI/ML models, and evaluate their performance using RMSE.
+
+### Key Components
+
+1. **Data Preparation**
+2. **Feature Engineering**
+3. **Correlation Analysis**
+4. **Trend Identification**
+5. **Model Training**
+6. **Model Evaluation**
+
+### 1. Data Preparation
+
+#### Context
+Forex data is high-frequency time series data that requires careful preprocessing to handle missing values, outliers, and ensure consistency. TimescaleDB is used for efficient storage and retrieval due to its scalability and time-series optimizations.
+
+**Technical Details:**
+- **Data Sourcing**: Forex data is typically retrieved from APIs such as OANDA, which provide real-time and historical data.
+- **Preprocessing**: This includes filling missing values using forward fill or interpolation methods, handling outliers through techniques like z-score normalization, and converting timestamps to a uniform format.
+
+### 2. Feature Engineering
+
+#### Context
+Feature engineering transforms raw data into meaningful features that enhance the model's predictive capabilities. This process is critical for time series analysis as it captures temporal dependencies and seasonality.
+
+**Technical Details:**
+- **Lag Features**: Introducing past values (lags) as predictors helps capture temporal dependencies.
+    - **Mathematical Formulation**: \( \text{Lag}(k) = X_{t-k} \)
+- **Rolling Statistics**: Calculating rolling mean, variance, and standard deviation captures local trends and volatility.
+    - **Mathematical Formulation**: \( \text{Rolling Mean}(w) = \frac{1}{w} \sum_{i=t-w+1}^{t} X_i \)
+- **Scaling**: Normalization or standardization ensures that features are on a similar scale, which is essential for models like LSTM and Transformers.
+
+### 3. Correlation Analysis
+
+#### Context
+Correlation analysis identifies relationships between different forex pairs, which can inform trading strategies and portfolio management. 
+
+**Technical Details:**
+- **Pearson Correlation**: Measures linear correlation between pairs.
+    - **Formula**: \( \rho_{X,Y} = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} \)
+    - **Properties**: Symmetric, bounded between -1 and 1.
+- **Visualization**: Heatmaps are used to visualize the correlation matrix, highlighting highly correlated pairs.
+
+### 4. Trend Identification
+
+#### Context
+Identifying trends helps in understanding the market direction and making informed trading decisions. Techniques like moving averages smooth out short-term fluctuations and highlight longer-term trends.
+
+**Technical Details:**
+- **Moving Averages**: Simple and exponential moving averages (SMA, EMA) are used.
+    - **SMA Formula**: \( \text{SMA}(n) = \frac{1}{n} \sum_{i=0}^{n-1} X_{t-i} \)
+    - **EMA Formula**: \( \text{EMA}(t) = \alpha \cdot X_t + (1-\alpha) \cdot \text{EMA}(t-1) \)
+- **Trend Lines**: Connecting significant highs or lows in price data to form resistance and support lines.
+
+### 5. Model Training
+
+#### Context
+Different machine learning models have different strengths in time series forecasting. This project uses ARIMA, LSTM, and Transformer models.
+
+**Technical Details:**
+
+**ARIMA (AutoRegressive Integrated Moving Average):**
+- **Components**: AR (p) - AutoRegression, I (d) - Integration, MA (q) - Moving Average.
+    - **AR**: \( X_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \dots + \phi_p X_{t-p} + \epsilon_t \)
+    - **I**: \( Y_t = X_t - X_{t-1} \) (d times differencing)
+    - **MA**: \( X_t = \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \dots + \theta_q \epsilon_{t-q} \)
+- **Use Case**: Effective for univariate time series with trends and seasonality.
+
+**LSTM (Long Short-Term Memory):**
+- **Architecture**: Special type of RNN capable of learning long-term dependencies.
+    - **Gates**: Input, forget, and output gates control the cell state.
+    - **Equations**:
+        - Forget Gate: \( f_t = \sigma(W_f \cdot [h_{t-1}, X_t] + b_f) \)
+        - Input Gate: \( i_t = \sigma(W_i \cdot [h_{t-1}, X_t] + b_i) \)
+        - Output Gate: \( o_t = \sigma(W_o \cdot [h_{t-1}, X_t] + b_o) \)
+        - Cell State: \( C_t = f_t * C_{t-1} + i_t * \tilde{C_t} \)
+- **Use Case**: Suitable for capturing long-term dependencies in time series data.
+
+**Transformers:**
+- **Architecture**: Self-attention mechanism allows the model to weigh the importance of different parts of the input sequence.
+    - **Attention Mechanism**: \( \text{Attention}(Q, K, V) = \text{softmax}\left( \frac{QK^T}{\sqrt{d_k}} \right) V \)
+    - **Components**: Multi-head attention, feed-forward networks, and positional encodings.
+- **Use Case**: Powerful for sequence modeling tasks, especially when capturing global dependencies.
+
+### 6. Model Evaluation
+
+#### Context
+Model evaluation is crucial to assess the accuracy and reliability of predictions. RMSE (Root Mean Squared Error) is a standard metric for this purpose.
+
+**Technical Details:**
+- **RMSE**: Measures the average magnitude of the error.
+    - **Formula**: \( \text{RMSE} = \sqrt{ \frac{1}{n} \sum_{i=1}^n (Y_i - \hat{Y_i})^2 } \)
+    - **Interpretation**: Lower RMSE indicates better model performance.
+
+### Workflow Summary (Pseudocode)
+
+#### Data Preparation
+1. Ingest data from OANDA.
+2. Preprocess data: handle missing values, outliers.
+3. Store preprocessed data in TimescaleDB.
+
+#### Feature Engineering
+1. Create lag features and rolling statistics.
+2. Store engineered features in TimescaleDB.
+
+#### Correlation Analysis and Storage
+1. Calculate correlation matrix.
+2. Store correlation results in TimescaleDB.
+
+#### Trend Identification and Storage
+1. Calculate moving averages and trend indicators.
+2. Store trend data in TimescaleDB.
+
+#### Model Training (ARIMA, LSTM, Transformers)
+1. Retrieve feature-engineered data from TimescaleDB.
+2. Train ARIMA, LSTM, and Transformer models.
+3. Store trained models and scalers.
+
+#### Model Evaluation and Storage
+1. Evaluate models using RMSE.
+2. Store evaluation results in TimescaleDB.
+
+### Conclusion
+This guide provides a detailed, technical overview of the methodologies used in forex time series analysis, leveraging advanced AI/ML models like ARIMA, LSTM, and Transformers. Each step is designed to ensure robustness, scalability, and accuracy in forecasting and trend identification, making it suitable for high-frequency trading environments and financial analytics.