In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
- Data loading
In [2]:
df = pd.read_csv("../data/regression.csv")
df.head()
Out[2]:
| feature_0 | feature_1 | feature_2 | feature_3 | feature_4 | feature_5 | target | |
|---|---|---|---|---|---|---|---|
| 0 | -0.753965 | 0.281191 | -0.062593 | -0.280675 | 0.758929 | 0.104201 | 15.914852 |
| 1 | 1.031845 | -0.439731 | 0.196555 | -1.485560 | -0.186872 | 1.446978 | -24.363081 |
| 2 | -0.600639 | 0.110923 | 0.375698 | -0.291694 | -0.544383 | -1.150994 | -55.864380 |
| 3 | 0.998311 | -0.322320 | 1.521316 | -0.431620 | 1.615376 | 1.217159 | 308.187994 |
| 4 | 0.338496 | 0.770865 | 1.143754 | -0.415288 | 0.235615 | -1.478586 | 165.850761 |
Dataset¶
The dataset contains:
- numerical input features
- a continuous target variable
This is a regression problem, therefore:
- there are no classes
- no confusion matrix is used
- Split features and target
In [3]:
X = df.drop("target", axis=1)
y = df["target"]
In [4]:
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
X_train.shape, X_test.shape
Out[4]:
((400, 6), (100, 6))
Model Training¶
We train a Linear Regression model using the training data.
In [5]:
model = LinearRegression()
model.fit(X_train, y_train)
Out[5]:
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Prediction¶
The trained model is used to make predictions on unseen test data.
In [6]:
y_pred = model.predict(X_test)
y_pred[:5]
Out[6]:
array([ 164.05060775, -168.51088955, -66.93683516, -35.28712732,
-261.425874 ])
Model Evaluation¶
For regression tasks, common evaluation metrics are:
- Mean Squared Error (MSE)
- R² Score
Classification metrics such as accuracy or F1-score are not applicable.
In [7]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error (MSE):", mse)
print("R² Score:", r2)
Mean Squared Error (MSE): 287.58605355048087 R² Score: 0.9888363641525172
Model Interpretation¶
The coefficients represent the contribution of each feature to the predicted target value.
In [8]:
coefficients = pd.Series(model.coef_, index=X.columns)
coefficients
Out[8]:
feature_0 43.251465 feature_1 98.112448 feature_2 96.219087 feature_3 35.651902 feature_4 81.433923 feature_5 26.081725 dtype: float64