132 lines
3.3 KiB
Markdown
132 lines
3.3 KiB
Markdown
This library provides simple and efficient tools for predictive data analysis and is built on NumPy, SciPy, and matplotlib. It includes a wide range of supervised and unsupervised learning algorithms. Below is a concise reference guide for common use cases with `scikit-learn`, formatted in Markdown syntax:
|
|
|
|
# `scikit-learn` Reference Guide
|
|
|
|
## Installation
|
|
```
|
|
pip install scikit-learn
|
|
```
|
|
|
|
## Basic Concepts
|
|
|
|
### Importing scikit-learn
|
|
```python
|
|
import sklearn
|
|
```
|
|
|
|
## Preprocessing Data
|
|
```python
|
|
from sklearn.preprocessing import StandardScaler, OneHotEncoder
|
|
|
|
# Standardize features
|
|
scaler = StandardScaler().fit(X)
|
|
X_scaled = scaler.transform(X)
|
|
|
|
# One-hot encode categorical variables
|
|
encoder = OneHotEncoder().fit(X_categorical)
|
|
X_encoded = encoder.transform(X_categorical)
|
|
```
|
|
|
|
## Splitting Data
|
|
```python
|
|
from sklearn.model_selection import train_test_split
|
|
|
|
# Split dataset into training set and test set
|
|
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
|
|
```
|
|
|
|
## Supervised Learning Algorithms
|
|
|
|
### Linear Regression
|
|
```python
|
|
from sklearn.linear_model import LinearRegression
|
|
|
|
model = LinearRegression()
|
|
model.fit(X_train, y_train)
|
|
predictions = model.predict(X_test)
|
|
```
|
|
|
|
### Classification (Logistic Regression)
|
|
```python
|
|
from sklearn.linear_model import LogisticRegression
|
|
|
|
model = LogisticRegression()
|
|
model.fit(X_train, y_train)
|
|
predictions = model.predict(X_test)
|
|
```
|
|
|
|
### Decision Trees
|
|
```python
|
|
from sklearn.tree import DecisionTreeClassifier
|
|
|
|
model = DecisionTreeClassifier()
|
|
model.fit(X_train, y_train)
|
|
predictions = model.predict(X_test)
|
|
```
|
|
|
|
## Unsupervised Learning Algorithms
|
|
|
|
### K-Means Clustering
|
|
```python
|
|
from sklearn.cluster import KMeans
|
|
|
|
model = KMeans(n_clusters=3)
|
|
model.fit(X)
|
|
labels = model.predict(X)
|
|
```
|
|
|
|
### Principal Component Analysis (PCA)
|
|
```python
|
|
from sklearn.decomposition import PCA
|
|
|
|
pca = PCA(n_components=2)
|
|
X_pca = pca.fit_transform(X)
|
|
```
|
|
|
|
## Model Evaluation
|
|
|
|
### Cross-Validation
|
|
```python
|
|
from sklearn.model_selection import cross_val_score
|
|
|
|
scores = cross_val_score(model, X, y, cv=5)
|
|
```
|
|
|
|
### Classification Metrics
|
|
```python
|
|
from sklearn.metrics import accuracy_score, confusion_matrix
|
|
|
|
accuracy = accuracy_score(y_test, predictions)
|
|
conf_matrix = confusion_matrix(y_test, predictions)
|
|
```
|
|
|
|
### Regression Metrics
|
|
```python
|
|
from sklearn.metrics import mean_squared_error, r2_score
|
|
|
|
mse = mean_squared_error(y_test, predictions)
|
|
r2 = r2_score(y_test, predictions)
|
|
```
|
|
|
|
## Tuning Hyperparameters
|
|
```python
|
|
from sklearn.model_selection import GridSearchCV
|
|
|
|
param_grid = {'n_estimators': [10, 50, 100], 'max_features': ['auto', 'sqrt', 'log2']}
|
|
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5)
|
|
grid_search.fit(X_train, y_train)
|
|
best_params = grid_search.best_params_
|
|
```
|
|
|
|
## Saving and Loading Models
|
|
```python
|
|
from joblib import dump, load
|
|
|
|
# Save a model
|
|
dump(model, 'model.joblib')
|
|
|
|
# Load a model
|
|
model = load('model.joblib')
|
|
```
|
|
|
|
`scikit-learn` is a versatile and comprehensive library that simplifies the implementation of many machine learning algorithms for data analysis projects. This guide touches on key features such as data preprocessing, model selection, training, and evaluation, but `scikit-learn`'s functionality extends far beyond these basics, making it a foundational tool in the machine learning practitioner's toolkit. |