Update projects/getting_started_ml.md
This commit is contained in:
@@ -871,4 +871,159 @@ plt.show()
|
|||||||
- **Model Evaluation:** Evaluate the model’s performance using appropriate metrics.
|
- **Model Evaluation:** Evaluate the model’s performance using appropriate metrics.
|
||||||
- **Advanced Techniques:** Use feature selection, ensemble methods, and cross-validation to improve the model.
|
- **Advanced Techniques:** Use feature selection, ensemble methods, and cross-validation to improve the model.
|
||||||
|
|
||||||
By following this structured approach, you can effectively select and train a baseline machine learning model to predict target variables using MQTT sensor data.
|
By following this structured approach, you can effectively select and train a baseline machine learning model to predict target variables using MQTT sensor data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Model Inference: Functions Overview
|
||||||
|
|
||||||
|
#### 1. Define the Use Case
|
||||||
|
|
||||||
|
- **Objective:** Clearly state the goal of the prediction during inference.
|
||||||
|
- **Real-time or Batch:** Determine if the inference will be performed in real-time or in batch mode.
|
||||||
|
- **Expected Output:** Define the expected output of the inference process.
|
||||||
|
|
||||||
|
#### 2. Data Collection for Inference
|
||||||
|
|
||||||
|
- **Real-time Data Collection:** Collect real-time data via MQTT for immediate inference.
|
||||||
|
- **Batch Data Collection:** Collect and aggregate data over a period for batch inference.
|
||||||
|
|
||||||
|
#### 3. Data Preprocessing for Inference
|
||||||
|
|
||||||
|
- **Handle Missing Values:** Replace missing values with appropriate substitutes.
|
||||||
|
- **Feature Engineering:** Apply the same feature engineering steps used during training (e.g., lag features, rolling statistics).
|
||||||
|
- **Normalization/Scaling:** Ensure the features are scaled consistently with the training data.
|
||||||
|
|
||||||
|
#### 4. Load the Trained Model
|
||||||
|
|
||||||
|
- **Model Serialization:** Load the trained model from storage (e.g., joblib, pickle).
|
||||||
|
- **Environment Setup:** Ensure the inference environment matches the training environment.
|
||||||
|
|
||||||
|
#### 5. Perform Inference
|
||||||
|
|
||||||
|
- **Predict:** Use the trained model to make predictions on the preprocessed data.
|
||||||
|
- **Post-process Results:** Convert the raw predictions into actionable insights.
|
||||||
|
|
||||||
|
#### 6. Monitoring and Logging
|
||||||
|
|
||||||
|
- **Log Predictions:** Store predictions for future analysis and auditing.
|
||||||
|
- **Monitor Performance:** Track the performance of the model over time to detect drift.
|
||||||
|
|
||||||
|
### Function Descriptions
|
||||||
|
|
||||||
|
#### Data Collection for Inference
|
||||||
|
|
||||||
|
1. **collect_real_time_data:**
|
||||||
|
- **Purpose:** Gather real-time data from MQTT sensors for immediate inference.
|
||||||
|
|
||||||
|
2. **collect_batch_data:**
|
||||||
|
- **Purpose:** Collect and aggregate sensor data over a specified period for batch inference.
|
||||||
|
|
||||||
|
#### Data Preprocessing for Inference
|
||||||
|
|
||||||
|
3. **preprocess_inference_data:**
|
||||||
|
- **Purpose:** Clean and preprocess the collected data, ensuring consistency with the training data preprocessing steps.
|
||||||
|
|
||||||
|
4. **feature_engineering_inference:**
|
||||||
|
- **Purpose:** Apply feature engineering steps to the inference data (e.g., creating lag features, rolling statistics).
|
||||||
|
|
||||||
|
5. **normalize_data:**
|
||||||
|
- **Purpose:** Normalize or scale the features to match the training data’s distribution.
|
||||||
|
|
||||||
|
#### Load the Trained Model
|
||||||
|
|
||||||
|
6. **load_trained_model:**
|
||||||
|
- **Purpose:** Load the trained machine learning model from storage.
|
||||||
|
|
||||||
|
#### Perform Inference
|
||||||
|
|
||||||
|
7. **perform_inference:**
|
||||||
|
- **Purpose:** Use the trained model to make predictions on the preprocessed inference data.
|
||||||
|
|
||||||
|
8. **post_process_predictions:**
|
||||||
|
- **Purpose:** Convert raw model predictions into actionable insights or outputs.
|
||||||
|
|
||||||
|
#### Monitoring and Logging
|
||||||
|
|
||||||
|
9. **log_predictions:**
|
||||||
|
- **Purpose:** Log predictions for auditing and future analysis.
|
||||||
|
|
||||||
|
10. **monitor_model_performance:**
|
||||||
|
- **Purpose:** Monitor the performance of the model over time to detect any degradation or drift.
|
||||||
|
|
||||||
|
### Detailed Overview of Inference Functions
|
||||||
|
|
||||||
|
#### Data Collection for Inference
|
||||||
|
|
||||||
|
1. **Function: collect_real_time_data**
|
||||||
|
- **Purpose:** Gather real-time data from MQTT sensors for immediate inference.
|
||||||
|
- **Description:** Connects to the MQTT broker, subscribes to relevant topics, and collects incoming messages.
|
||||||
|
|
||||||
|
2. **Function: collect_batch_data**
|
||||||
|
- **Purpose:** Collect and aggregate sensor data over a specified period for batch inference.
|
||||||
|
- **Description:** Queries the TimescaleDB to retrieve aggregated data for batch processing.
|
||||||
|
|
||||||
|
#### Data Preprocessing for Inference
|
||||||
|
|
||||||
|
3. **Function: preprocess_inference_data**
|
||||||
|
- **Purpose:** Clean and preprocess the collected data, ensuring consistency with the training data preprocessing steps.
|
||||||
|
- **Description:** Handles missing values and applies necessary transformations to prepare the data for inference.
|
||||||
|
|
||||||
|
4. **Function: feature_engineering_inference**
|
||||||
|
- **Purpose:** Apply feature engineering steps to the inference data (e.g., creating lag features, rolling statistics).
|
||||||
|
- **Description:** Applies the same feature engineering techniques used during training to ensure consistency.
|
||||||
|
|
||||||
|
5. **Function: normalize_data**
|
||||||
|
- **Purpose:** Normalize or scale the features to match the training data’s distribution.
|
||||||
|
- **Description:** Uses the same normalization/scaling parameters used during training.
|
||||||
|
|
||||||
|
#### Load the Trained Model
|
||||||
|
|
||||||
|
6. **Function: load_trained_model**
|
||||||
|
- **Purpose:** Load the trained machine learning model from storage.
|
||||||
|
- **Description:** Deserializes the model using joblib or pickle.
|
||||||
|
|
||||||
|
#### Perform Inference
|
||||||
|
|
||||||
|
7. **Function: perform_inference**
|
||||||
|
- **Purpose:** Use the trained model to make predictions on the preprocessed inference data.
|
||||||
|
- **Description:** Applies the model to the preprocessed data to generate predictions.
|
||||||
|
|
||||||
|
8. **Function: post_process_predictions**
|
||||||
|
- **Purpose:** Convert raw model predictions into actionable insights or outputs.
|
||||||
|
- **Description:** Transforms the predictions into a user-friendly format or actionable output.
|
||||||
|
|
||||||
|
#### Monitoring and Logging
|
||||||
|
|
||||||
|
9. **Function: log_predictions**
|
||||||
|
- **Purpose:** Log predictions for auditing and future analysis.
|
||||||
|
- **Description:** Stores predictions in a database or log file for future reference.
|
||||||
|
|
||||||
|
10. **Function: monitor_model_performance**
|
||||||
|
- **Purpose:** Monitor the performance of the model over time to detect any degradation or drift.
|
||||||
|
- **Description:** Tracks model performance metrics and alerts if performance degrades.
|
||||||
|
|
||||||
|
### Example Workflow for Inference
|
||||||
|
|
||||||
|
1. **Data Collection for Inference:**
|
||||||
|
- **Real-time:** `collect_real_time_data()`
|
||||||
|
- **Batch:** `collect_batch_data()`
|
||||||
|
|
||||||
|
2. **Data Preprocessing for Inference:**
|
||||||
|
- `preprocess_inference_data()`
|
||||||
|
- `feature_engineering_inference()`
|
||||||
|
- `normalize_data()`
|
||||||
|
|
||||||
|
3. **Load the Trained Model:**
|
||||||
|
- `load_trained_model()`
|
||||||
|
|
||||||
|
4. **Perform Inference:**
|
||||||
|
- `perform_inference()`
|
||||||
|
- `post_process_predictions()`
|
||||||
|
|
||||||
|
5. **Monitoring and Logging:**
|
||||||
|
- `log_predictions()`
|
||||||
|
- `monitor_model_performance()`
|
||||||
|
|
||||||
|
### Summary
|
||||||
|
By following this structured approach, you can effectively perform inference using sensor data collected via MQTT. Each function plays a critical role in ensuring the predictions are accurate, reliable, and actionable. This process ensures that the model’s performance is maintained and monitored over time, providing valuable insights and driving decision-making.
|
||||||
Reference in New Issue
Block a user