Update projects/getting_started_ml.md

This commit is contained in:
2024-06-07 12:45:28 +00:00
parent ac951648cd
commit eeb646c74f

View File

@@ -871,4 +871,159 @@ plt.show()
- **Model Evaluation:** Evaluate the models performance using appropriate metrics. - **Model Evaluation:** Evaluate the models performance using appropriate metrics.
- **Advanced Techniques:** Use feature selection, ensemble methods, and cross-validation to improve the model. - **Advanced Techniques:** Use feature selection, ensemble methods, and cross-validation to improve the model.
By following this structured approach, you can effectively select and train a baseline machine learning model to predict target variables using MQTT sensor data. By following this structured approach, you can effectively select and train a baseline machine learning model to predict target variables using MQTT sensor data.
---
### Model Inference: Functions Overview
#### 1. Define the Use Case
- **Objective:** Clearly state the goal of the prediction during inference.
- **Real-time or Batch:** Determine if the inference will be performed in real-time or in batch mode.
- **Expected Output:** Define the expected output of the inference process.
#### 2. Data Collection for Inference
- **Real-time Data Collection:** Collect real-time data via MQTT for immediate inference.
- **Batch Data Collection:** Collect and aggregate data over a period for batch inference.
#### 3. Data Preprocessing for Inference
- **Handle Missing Values:** Replace missing values with appropriate substitutes.
- **Feature Engineering:** Apply the same feature engineering steps used during training (e.g., lag features, rolling statistics).
- **Normalization/Scaling:** Ensure the features are scaled consistently with the training data.
#### 4. Load the Trained Model
- **Model Serialization:** Load the trained model from storage (e.g., joblib, pickle).
- **Environment Setup:** Ensure the inference environment matches the training environment.
#### 5. Perform Inference
- **Predict:** Use the trained model to make predictions on the preprocessed data.
- **Post-process Results:** Convert the raw predictions into actionable insights.
#### 6. Monitoring and Logging
- **Log Predictions:** Store predictions for future analysis and auditing.
- **Monitor Performance:** Track the performance of the model over time to detect drift.
### Function Descriptions
#### Data Collection for Inference
1. **collect_real_time_data:**
- **Purpose:** Gather real-time data from MQTT sensors for immediate inference.
2. **collect_batch_data:**
- **Purpose:** Collect and aggregate sensor data over a specified period for batch inference.
#### Data Preprocessing for Inference
3. **preprocess_inference_data:**
- **Purpose:** Clean and preprocess the collected data, ensuring consistency with the training data preprocessing steps.
4. **feature_engineering_inference:**
- **Purpose:** Apply feature engineering steps to the inference data (e.g., creating lag features, rolling statistics).
5. **normalize_data:**
- **Purpose:** Normalize or scale the features to match the training datas distribution.
#### Load the Trained Model
6. **load_trained_model:**
- **Purpose:** Load the trained machine learning model from storage.
#### Perform Inference
7. **perform_inference:**
- **Purpose:** Use the trained model to make predictions on the preprocessed inference data.
8. **post_process_predictions:**
- **Purpose:** Convert raw model predictions into actionable insights or outputs.
#### Monitoring and Logging
9. **log_predictions:**
- **Purpose:** Log predictions for auditing and future analysis.
10. **monitor_model_performance:**
- **Purpose:** Monitor the performance of the model over time to detect any degradation or drift.
### Detailed Overview of Inference Functions
#### Data Collection for Inference
1. **Function: collect_real_time_data**
- **Purpose:** Gather real-time data from MQTT sensors for immediate inference.
- **Description:** Connects to the MQTT broker, subscribes to relevant topics, and collects incoming messages.
2. **Function: collect_batch_data**
- **Purpose:** Collect and aggregate sensor data over a specified period for batch inference.
- **Description:** Queries the TimescaleDB to retrieve aggregated data for batch processing.
#### Data Preprocessing for Inference
3. **Function: preprocess_inference_data**
- **Purpose:** Clean and preprocess the collected data, ensuring consistency with the training data preprocessing steps.
- **Description:** Handles missing values and applies necessary transformations to prepare the data for inference.
4. **Function: feature_engineering_inference**
- **Purpose:** Apply feature engineering steps to the inference data (e.g., creating lag features, rolling statistics).
- **Description:** Applies the same feature engineering techniques used during training to ensure consistency.
5. **Function: normalize_data**
- **Purpose:** Normalize or scale the features to match the training datas distribution.
- **Description:** Uses the same normalization/scaling parameters used during training.
#### Load the Trained Model
6. **Function: load_trained_model**
- **Purpose:** Load the trained machine learning model from storage.
- **Description:** Deserializes the model using joblib or pickle.
#### Perform Inference
7. **Function: perform_inference**
- **Purpose:** Use the trained model to make predictions on the preprocessed inference data.
- **Description:** Applies the model to the preprocessed data to generate predictions.
8. **Function: post_process_predictions**
- **Purpose:** Convert raw model predictions into actionable insights or outputs.
- **Description:** Transforms the predictions into a user-friendly format or actionable output.
#### Monitoring and Logging
9. **Function: log_predictions**
- **Purpose:** Log predictions for auditing and future analysis.
- **Description:** Stores predictions in a database or log file for future reference.
10. **Function: monitor_model_performance**
- **Purpose:** Monitor the performance of the model over time to detect any degradation or drift.
- **Description:** Tracks model performance metrics and alerts if performance degrades.
### Example Workflow for Inference
1. **Data Collection for Inference:**
- **Real-time:** `collect_real_time_data()`
- **Batch:** `collect_batch_data()`
2. **Data Preprocessing for Inference:**
- `preprocess_inference_data()`
- `feature_engineering_inference()`
- `normalize_data()`
3. **Load the Trained Model:**
- `load_trained_model()`
4. **Perform Inference:**
- `perform_inference()`
- `post_process_predictions()`
5. **Monitoring and Logging:**
- `log_predictions()`
- `monitor_model_performance()`
### Summary
By following this structured approach, you can effectively perform inference using sensor data collected via MQTT. Each function plays a critical role in ensuring the predictions are accurate, reliable, and actionable. This process ensures that the models performance is maintained and monitored over time, providing valuable insights and driving decision-making.