Deep Learning – Regression (PyTorch)¶

This notebook is part of the ML-Methods project.

It introduces Deep Learning for supervised regression using the PyTorch framework.

As with all other notebooks in this project, the initial sections focus on data preparation and are intentionally repeated.

This ensures:

  • consistency across models
  • fair comparison of results
  • a unified learning pipeline

Notebook Roadmap (standard ML-Methods)¶

  1. Project setup and common pipeline
  2. Dataset loading
  3. Train-test split
  4. Feature scaling (why we do it)

  1. What is this model? (Intuition)
  2. Model training
  3. Model behavior and key parameters
  4. Predictions
  5. Model evaluation
  6. When to use it and when not to
  7. Model persistence
  8. Mathematical formulation (deep dive)
  9. Final summary – Code only

How this notebook should be read¶

This notebook is designed to be read top to bottom.

Before every code cell, you will find a short explanation describing:

  • what we are about to do
  • why this step is necessary
  • how it fits into the overall process

Compared to scikit-learn, this notebook exposes more of the training mechanics.

The goal is to understand:

  • how deep learning regression works internally
  • how training is controlled explicitly
  • how PyTorch differs from high-level abstractions

What is Deep Learning (in this context)?¶

In this notebook, Deep Learning refers to neural networks trained manually using PyTorch.

Unlike scikit-learn:

  • the training loop is explicit
  • forward and backward passes are visible
  • optimization is controlled step by step

This provides deeper insight into how regression models actually learn.


What do we want to achieve?¶

Our objective is to train a neural network that:

  • takes numerical input features
  • processes them through multiple layers
  • outputs a single continuous value

The model learns a mapping:

input features → numerical target


Why use PyTorch for regression?¶

PyTorch is a low-level deep learning framework that provides fine-grained control over training.

Using PyTorch allows us to:

  • see the forward pass explicitly
  • control loss computation
  • manage gradients manually
  • understand optimization mechanics

This notebook represents the next conceptual step after scikit-learn: from abstraction → understanding.


What you should expect from the results¶

With Deep Learning regression in PyTorch, you should expect:

  • non-linear regression capability
  • flexible model architecture
  • full control over training dynamics
  • behavior similar to scikit-learn MLP

However:

  • code is more verbose
  • more responsibility is on the user
  • mistakes are easier to make

1. Project setup and common pipeline¶

In this section we set up the common pipeline used across regression models in this project.

Although the model is implemented in PyTorch, data preparation remains consistent with all other regression notebooks.

In [1]:
# ====================================
# Common imports used across regression models
# ====================================

import numpy as np
import pandas as pd

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import (
    mean_squared_error,
    mean_absolute_error,
    r2_score
)

from pathlib import Path
import joblib
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim

PyTorch vs scikit-learn (at a glance)¶

Compared to scikit-learn:

  • models are defined as Python classes
  • training loops are written manually
  • gradients are handled explicitly

The surrounding pipeline, however, remains unchanged.

In the next section, we will load the regression dataset used throughout this notebook.


2. Dataset loading¶

In this section we load the dataset used for the deep learning regression task.

We use the same regression dataset adopted in the other regression notebooks to ensure fair comparison across models.

In [2]:
# ====================================
# Dataset loading
# ====================================

data = fetch_california_housing(as_frame=True)

X = data.data
y = data.target

Inputs and target¶

  • X contains the input features
  • y contains the continuous target variable

This is a supervised regression problem:

  • each input corresponds to a real-valued output
  • the goal is to predict a numerical quantity

At this stage:

  • data is still in pandas format
  • no preprocessing has been applied yet

In the next section, we will split the dataset into training and test sets.


3. Train-test split¶

In this section we split the dataset into training and test sets.

This allows us to evaluate how well the neural network generalizes to unseen data.

In [3]:
# ====================================
# Train-test split
# ====================================

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

Why this step is essential¶

A regression model must be evaluated on data it has never seen during training.

By separating the dataset:

  • the training set is used for learning
  • the test set is used only for evaluation

This prevents data leakage and ensures realistic performance estimates.

In the next section, we will apply feature scaling, which is mandatory for deep learning models.


4. Feature scaling (why we do it)¶

In this section we apply feature scaling to the input features.

For deep learning regression models, feature scaling is mandatory.

In [4]:
# ====================================
# Feature scaling
# ====================================

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Why we use standardization here¶

Neural networks are trained using gradient-based optimization.

Standardization:

  • centers features around zero
  • ensures comparable variance across features
  • improves numerical stability during training

Without proper scaling:

  • gradients may explode or vanish
  • optimization may fail
  • training becomes unstable

At this point:

  • data is in NumPy format
  • values are ready to be converted into tensors

In the next section, we will explain what the PyTorch model is and how neural networks perform regression at a lower level.


5. What is this model? (Deep Learning Regression – PyTorch)¶

Before writing any PyTorch code, it is important to understand what the model is conceptually doing.

In regression, the goal is to predict a continuous numerical value from a set of input features.

How regression works in a neural network¶

A neural network for regression:

  • receives a vector of input features
  • transforms it through multiple layers
  • outputs a single real number

The model learns a function:

input vector → numerical output

Unlike linear regression, this function is non-linear and learned progressively.

What PyTorch adds conceptually¶

With PyTorch:

  • we define the model explicitly
  • we control the forward pass
  • we decide how loss is computed
  • we update parameters manually

This makes PyTorch ideal for understanding how learning happens, not just that it happens.

High-level learning process¶

Training follows a loop:

  1. forward pass (prediction)
  2. loss computation (error)
  3. backward pass (gradients)
  4. parameter update

This loop is repeated until the model learns a good approximation of the regression function.

Key takeaway¶

PyTorch regression models:

  • learn non-linear functions
  • expose training mechanics explicitly
  • behave similarly to scikit-learn models when architecture and data are the same

In the next section, we will define the neural network architecture using PyTorch.


6. Model training (PyTorch Regression)¶

In this section we define and train a neural network regressor using PyTorch.

Unlike scikit-learn, both the model and the training loop must be written explicitly.

In [5]:
# ====================================
# Model definition
# ====================================

class RegressionNet(nn.Module):
    def __init__(self, input_dim):
        super().__init__()

        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.out = nn.Linear(32, 1)

        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.out(x)
        return x
In [6]:
# ====================================
# Training setup
# ====================================

input_dim = X_train_scaled.shape[1]

model = RegressionNet(input_dim)

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
In [7]:
# ====================================
# Training loop
# ====================================

X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).view(-1, 1)

epochs = 100
losses = []

for epoch in range(epochs):
    model.train()

    optimizer.zero_grad()

    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    loss.backward()
    optimizer.step()

    losses.append(loss.item())

What is happening during training¶

  • The model performs a forward pass
  • Predictions are compared to true values
  • Loss (MSE) measures prediction error
  • Gradients are computed automatically
  • Parameters are updated using Adam

This explicit loop is the core of PyTorch.

In the next section, we analyze model behavior and key parameters.


7. Model behavior and key parameters¶

In this section we analyze how the PyTorch regression model behaves and which parameters influence learning.

Model architecture¶

The network uses:

  • two hidden layers
  • ReLU activation
  • linear output layer

This allows:

  • non-linear feature interactions
  • continuous output prediction

Loss function¶

We use Mean Squared Error (MSE):

  • penalizes large errors
  • standard choice for regression
  • differentiable and stable

Optimizer behavior¶

Adam optimizer:

  • adapts learning rates
  • speeds up convergence
  • works well for most regression tasks

Training duration¶

More epochs:

  • improve learning initially
  • may cause overfitting if excessive

Monitoring loss over epochs helps diagnose training behavior.

Key takeaway¶

PyTorch gives full control over model behavior.

This flexibility allows:

  • custom architectures
  • precise debugging
  • deeper understanding

In the next section, we will generate predictions using the trained model.


8. Predictions¶

In this section we use the trained PyTorch model to generate predictions on unseen test data.

Predictions are continuous numerical values.

In [8]:
# ====================================
# Predictions
# ====================================

model.eval()

X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)

with torch.no_grad():
    y_pred_tensor = model(X_test_tensor)

y_pred = y_pred_tensor.numpy().flatten()

What happens during prediction¶

  • The model is set to evaluation mode
  • Gradients are disabled
  • The forward pass generates predictions

This ensures:

  • faster inference
  • no gradient accumulation

What we have now¶

At this point:

  • y_test contains true target values
  • y_pred contains predicted values

These will be compared using regression metrics in the next section.


9. Model evaluation¶

In this section we evaluate the performance of the PyTorch regression model on unseen test data.

For regression problems, evaluation focuses on prediction error and quality of fit.

In [9]:
# ====================================
# Regression evaluation metrics
# ====================================

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

mse, rmse, mae, r2
Out[9]:
(0.7457036463188563,
 np.float64(0.8635413402488942),
 0.626965656223535,
 0.4309382347792724)

How to read these results¶

  • RMSE
    Measures the typical prediction error in the same unit as the target variable.

  • MAE
    Measures the average absolute error and is more robust to outliers.

  • R² score
    Measures how much of the variance in the target is explained by the model.

These metrics together provide a complete view of regression performance.

Key takeaway¶

Evaluation metrics must always be computed on unseen data.

A non-zero RMSE is expected and indicates that the model is generalizing, not memorizing the data.


10. When to use it and when not to¶

Deep Learning regression with PyTorch is powerful but not always necessary.

Choosing this approach depends on problem complexity and practical constraints.

When to use PyTorch for regression¶

PyTorch regression is a good choice when:

  • relationships are highly non-linear
  • model architecture must be customized
  • training dynamics need full control
  • experimentation and research are required

It is especially useful when building models from scratch or exploring novel architectures.

When NOT to use PyTorch for regression¶

PyTorch may not be ideal when:

  • the dataset is small
  • the problem is simple
  • rapid prototyping is needed
  • interpretability is critical

In these cases, simpler models or scikit-learn are often more efficient.

Key takeaway¶

PyTorch offers maximum flexibility at the cost of increased complexity.

It should be chosen when control and understanding are more important than convenience.


11. Model persistence¶

In this section we save the trained PyTorch model and the preprocessing steps used during training.

In [ ]:
# ====================================
# Model persistence
# ====================================

model_dir = Path("models/supervised_learning/regression/deep_learning_pytorch")
model_dir.mkdir(parents=True, exist_ok=True)

# Save model state
torch.save(model.state_dict(), model_dir / "pytorch_regression_model.pt")

# Save scaler
joblib.dump(scaler, model_dir / "scaler.joblib")

What we have saved¶

We saved:

  • the trained PyTorch model parameters
  • the feature scaler

Together, these represent the complete regression pipeline.

Why saving the scaler matters¶

Neural networks are sensitive to feature scaling.

Using a different scaler would lead to inconsistent predictions.

Saving the scaler ensures reproducibility and correctness.


12. Mathematical formulation (deep dive)¶

This section describes the mathematical principles behind deep learning regression implemented in PyTorch.

Regression objective¶

The dataset is represented as:

$$ \{(x_i, y_i)\}_{i=1}^n $$

where:

  • $x_i \in \mathbb{R}^d$
  • $y_i \in \mathbb{R}$

Model as a function¶

The neural network learns a function:

$$ \hat{y} = f(x; \theta) $$

where:

  • $\theta$ represents weights and biases
  • $\hat{y}$ is the predicted value

Layer transformations¶

Each hidden layer computes:

$$ h = \text{ReLU}(Wx + b) $$

The output layer is linear:

$$ \hat{y} = W_{\text{out}} h + b_{\text{out}} $$

Loss function¶

Training minimizes Mean Squared Error:

$$ MSE = \frac{1}{n} \sum (y - \hat{y})^2 $$

Optimization¶

Gradients are computed via backpropagation, and parameters are updated using Adam:

$$ \theta \leftarrow \theta - \eta \nabla_\theta MSE $$

Final takeaway¶

Deep learning regression can be viewed as non-linear function approximation optimized via gradient descent.

PyTorch exposes these mechanisms explicitly, making learning transparent and flexible.


13. Final summary – Code only¶

The following cell contains the complete PyTorch regression pipeline.

No explanations are provided here.

In [ ]:
# ====================================
# Imports
# ====================================

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

from pathlib import Path
import joblib


# ====================================
# Dataset loading
# ====================================

data = fetch_california_housing(as_frame=True)
X = data.data
y = data.target


# ====================================
# Train-test split
# ====================================

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


# ====================================
# Feature scaling
# ====================================

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


# ====================================
# Model definition
# ====================================

class RegressionNet(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.out = nn.Linear(32, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        return self.out(x)


input_dim = X_train_scaled.shape[1]
model = RegressionNet(input_dim)


# ====================================
# Training setup
# ====================================

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

X_train_tensor = torch.tensor(X_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32).view(-1, 1)


# ====================================
# Training loop
# ====================================

epochs = 100

for _ in range(epochs):
    optimizer.zero_grad()
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    loss.backward()
    optimizer.step()


# ====================================
# Predictions
# ====================================

model.eval()
X_test_tensor = torch.tensor(X_test_scaled, dtype=torch.float32)

with torch.no_grad():
    y_pred = model(X_test_tensor).numpy().flatten()


# ====================================
# Evaluation
# ====================================

mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

mse, rmse, mae, r2


# ====================================
# Model persistence
# ====================================

model_dir = Path("models/supervised_learning/regression/deep_learning_pytorch")
model_dir.mkdir(parents=True, exist_ok=True)

torch.save(model.state_dict(), model_dir / "pytorch_regression_model.pt")
joblib.dump(scaler, model_dir / "scaler.joblib")