```
import os
import sys
import torch
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.api as sm
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
```

Ready for a deep-dive into the exciting realm of data interpretation ? 📊 Today, Multi-Variate Regression is on the menu. Sounds complex ? Well, by the time we’re done, it’ll feel like a walk in the park 🌳🚶♂️

We’ll be looking at how to train a Multi-Variate Linear Regression model, demystifying those complex mathematical equations, and transforming them into simple, understandable concepts. And the best part ? We’ll be visualizing our results, turning abstract numbers into vibrant, understandable graphics 📈

Buckle-up because we’re about to roll up our sleeves and dive into implementing Multi-Variate Linear Regression from scratch using Pytorch 💪

If you haven’t read the previous article of this series, feel free to explore my article title **Linear Regression Made Easy: A Pytorch From Scratch Implementation**.

# The Dataset: CarDekho

In this study, we will be using the CarDekho dataset by trying to predict car prices. The dataset contains the following features 🔬:

- the name of the car
- the year it was released
- the selling price
- the present price
- the number of kilometers driven
- the type of fuel used (petrol or diesel)
- the transmission type (manual or automatic)
- the number of times the car has changed hands

Each of these factors significantly contributes to a car’s price, and we will use this data to build a predictive model 🤖

# Exploratory Data Analysis

Exploratory Data Analysis (EDA) is a critical preliminary step that involves summarizing the main characteristics of a dataset through visual methods or statistical summaries 📊

It is essential for Machine Learning as it allows us to understand the underlying structure of the data, identify outliers, discover patterns, and test assumptions using powerful visual or quantitative methods, thus providing valuable insights for building efficient and accurate predictive models 🧠💡

First let’s import the necessary packages:

Now let’s read the data into a Pandas dataframe. You can download the dataset with this Kaggle link.

```
= pd.read_csv(os.path.join('data', 'car_data.csv'))
df df.head()
```

Car_Name | Year | Selling_Price | Present_Price | Kms_Driven | Fuel_Type | Seller_Type | Transmission | Owner | |
---|---|---|---|---|---|---|---|---|---|

0 | ritz | 2014 | 3.35 | 5.59 | 27000 | Petrol | Dealer | Manual | 0 |

1 | sx4 | 2013 | 4.75 | 9.54 | 43000 | Diesel | Dealer | Manual | 0 |

2 | ciaz | 2017 | 7.25 | 9.85 | 6900 | Petrol | Dealer | Manual | 0 |

3 | wagon r | 2011 | 2.85 | 4.15 | 5200 | Petrol | Dealer | Manual | 0 |

4 | swift | 2014 | 4.60 | 6.87 | 42450 | Diesel | Dealer | Manual | 0 |

We need to convert the categorical columns as integer ids:

```
= df[['Year', 'Selling_Price', 'Present_Price', 'Kms_Driven', 'Owner']]
f_continuous = pd.get_dummies(df[['Fuel_Type', 'Seller_Type', 'Transmission']])
f_categorical = pd.concat([f_continuous, f_categorical], axis=1)
df
# Drop refundant features
'Transmission_Automatic', 'Seller_Type_Dealer', 'Fuel_Type_CNG'], axis=1, inplace=True)
df.drop([ df.head()
```

Year | Selling_Price | Present_Price | Kms_Driven | Owner | Fuel_Type_Diesel | Fuel_Type_Petrol | Seller_Type_Individual | Transmission_Manual | |
---|---|---|---|---|---|---|---|---|---|

0 | 2014 | 3.35 | 5.59 | 27000 | 0 | 0 | 1 | 0 | 1 |

1 | 2013 | 4.75 | 9.54 | 43000 | 0 | 1 | 0 | 0 | 1 |

2 | 2017 | 7.25 | 9.85 | 6900 | 0 | 0 | 1 | 0 | 1 |

3 | 2011 | 2.85 | 4.15 | 5200 | 0 | 0 | 1 | 0 | 1 |

4 | 2014 | 4.60 | 6.87 | 42450 | 0 | 1 | 0 | 0 | 1 |

## Individual Feature Distribution

Let’s visualize the individual feature distributions to understand the range, central tendencies, and spread of our data, which in turn helps us identify any skewness, outliers, or anomalies that could impact our model’s performance 📈🔎

```
=14, color='steelblue', edgecolor='black', linewidth=1.0, xlabelsize=8, ylabelsize=8, grid=False)
df.hist(bins=(0, 0, 1.2, 1.2)) plt.tight_layout(rect
```

For these bar plots, we can see that most cars on sales are:

- consuming petrol instead of diesel
- had only one owner
- are from the 2012-present time range
- are manual
- sold between 1 and 10, 100 000 Indian Rupees

## Heatmap correlation

Let’s introduce our most colorful friend: the Heatmap Correlation ! 🌈 It is a graphical representation that showcases the correlation or relationship between different variables or features in a dataset using color-coded cells.

When two variables are highly correlated, they exhibit a value close to 1 (or -1 for inverse correlation), while a value near 0 indicates a lack of correlation between the variables.

A Heatmap Correlation, in machine learning, help us understand the relationships and dependencies between different features, thus providing a way to identify patterns and make informed decisions during the model-building process 🔧

```
=(16, 8))
plt.figure(figsize= True, annot=True, fmt='.2f') sns.heatmap(df.corr(), square
```

`<matplotlib.axes._subplots.AxesSubplot at 0x1f5bb762b08>`

The correlation between the variables and the selling price is the most useful information for us as it is what we want to predict 🚗💵

Most variables have a strong correlation, except for kilometers driven and number of owners, which are less connected. We could have deleted them, but since they matter in car-buying decisions, we’ll keep them in our analysis 🔍

## Pairwise Plots

A pairwise plot, displays a grid of scatter plots, where each plot shows the relationship between the distribution of two variables.

Pairwise plots are useful because they allow us to quickly assess the patterns between variables.

```
= ['Kms_Driven', 'Year', 'Selling_Price', 'Present_Price']
cols_viz = sns.pairplot(df[cols_viz], height=1.8, aspect=1.8,
pp =dict(edgecolor="k", linewidth=0.5),
plot_kws="kde", diag_kws=dict(shade=True))
diag_kind
= pp.fig
fig =0.93, wspace=0.3)
fig.subplots_adjust(top= fig.suptitle('Wine Attributes Pairwise Plots', fontsize=14) t
```

Here, we can see that there’s one of two outliers. An outlier is a data point deviates or differs from the majority of other data points in a dataset.

The year feature has a polynomial correlation with the selling price so a polynomial regression will most likely outperform a standard linear regression.

Instead, the present price has a linear relationship with the present price.

# Model Training and Testing

## Create training data partition

The train-validation-test split is a process of dividing a dataset into three distinct subsets: the training set, the validation set, and the test set.

The training set is used to train the model, the validation set is used to tune model parameters and evaluate performance during development, and the test set is used to assess the final model’s performance on unseen data.

```
# Separate the target from the dataFrame
= df['Selling_Price']
Y = df.drop('Selling_Price', axis=1)
X
# Convert data to Pytorch tensor
= torch.from_numpy(X.to_numpy()).float()
X_t = torch.from_numpy(Y.to_numpy()).float().unsqueeze(1)
Y_t = train_test_split(X_t, Y_t, test_size=0.33, random_state=42) X_train, X_test, Y_train, Y_test
```

## Multi-Variate Linear Regression

Alright, it’s time to dive into some math now ! 📐

Training a linear model using least square regression is equivalent to minimize the mean squared error:

\[\begin{align} \text{Mse}(\boldsymbol{\hat{y}}, \boldsymbol{y}) &= \frac{1}{n}\sum_{i=1}^{n}{||\hat{y}_i - y_i ||_{2}^{2}} \\ &= \frac{1}{n}||\boldsymbol{X}\boldsymbol{w} - \boldsymbol{y} ||_2^2 \end{align}\]

where \(n\) is the number of samples, \(\hat{y}\) is the predicted value of the model and \(y\) is the true target. The prediction \(\hat{y}\) is obtained by matrix multiplication between the input \(\boldsymbol{X}\) and the weights of the model \(\boldsymbol{w}\).

Minimizing the \(\text{Mse}\) can be achieved by solving the gradient of this equation equals to zero in regards to the weights \(\boldsymbol{w}\):

\[\begin{align} \nabla_{\boldsymbol{w}}\text{Mse} &= 0 \\ (\boldsymbol{X}^\top \boldsymbol{X})^{-1}\boldsymbol{X}^\top \boldsymbol{y} &= \boldsymbol{w} \end{align}\]

For more information on how to find \(\boldsymbol{w}\) please visit the following link.

## Building the Model

Let’s build the model using Pytorch:

```
def add_ones_col(X):
"""Add a column a one to the input torch tensor"""
= torch.ones((X.shape[0],), dtype=torch.float32).unsqueeze(1)
x_0 = torch.cat([x_0, X], dim=1)
X return X
def multi_linear_reg(X, y):
"""Multivariate linear regression function
Args:
X: A torch tensor for the data.
y: A torch tensor for the labels.
"""
= add_ones_col(X) # Add a column of ones to X to agregate the bias to the input matrices
X = X.T.mm(X)
Xt_X = X.T.mm(y)
Xt_y
= Xt_X.inverse()
Xt_X_inv = Xt_X_inv.mm(Xt_y)
w return w
def prediction(X, w):
"""Predicts a selling price for each input
Args:
X: A torch tensor for the data.
w: A torch tensor for the weights of the linear regression mode.
"""
= add_ones_col(X)
X return X.mm(w)
```

Time to train the model 🎉

```
# Fit the training set into the model to get the weights
= multi_linear_reg(X_train, Y_train)
w
# Predict using matrix multiplication with the weights
= prediction(X_train, w)
Y_pred_train = prediction(X_test, w) Y_pred_test
```

## Computing the Prediction Errors

Now let’s crunch some numbers and calculate the metric to assess our model’s performance.

```
def mse(Y_true, Y_pred):
= Y_pred - Y_true
error return (error.T.mm(error) / Y_pred.shape[0]).item()
def mae(Y_true, Y_pred):
= Y_pred - Y_true
error return error.abs().mean().item()
```

```
= mse(Y_train, Y_pred_train)
mse_train = mae(Y_train, Y_pred_train)
mae_train print('MSE Train:\t', mse_train)
print('MAE Train:\t', mae_train, end='\n\n')
= mse(Y_test, Y_pred_test)
mse_test = mae(Y_test, Y_pred_test)
mae_test print('MSE Test:\t', mse_test)
print('MAE Test:\t', mae_test, end='\n\n')
```

```
MSE Train: 2.808985471725464
MAE Train: 1.1321566104888916
MSE Test: 3.7205495834350586
MAE Test: 1.2941011190414429
```

The model has an \(\text{Mse}\) of 3.72 on average on the test set. It means that, on average, the squared difference between the predicted values and the actual values is 3.72.

The selling price unit is in 100 000 Indian Rupee, which convert to 1 USD = 82,20 INR at the time of writing. It means that on average, the model has an approximate error of 408 USD on the test set.

# Principal Component Analysis (PCA) and 3D Visualization

In this section, we will use PCA to reduce the number of feature to two, in order to visualize the plane of the linear regressor.

Suppose a collection of \(m\) points \(\{\boldsymbol{x}^{(1)}, \dots, \boldsymbol{x}^{(m)}\} \in \mathbb{R}^n\). The principal components analysis aims to reduce the dimensionality of the points while losing the least precision as possible. For each point \(\boldsymbol{x}^{(i)} \in \mathbb{R}^n\) we will find a corresponding code vector \(\boldsymbol{c}^{(i)} \in \mathbb{R}^l\) where \(l\) is smaller than \(n\). Let \(f\) be the encoding function and \(g\) be the decoding function and \(\boldsymbol{D} \in \mathbb{R}^{n,l}\) is the decoding matrix whose columns are orthonormal: \[\begin{align} f(\boldsymbol{x}) &= \boldsymbol{D}^\top \boldsymbol{x} \\ g(f(\boldsymbol{x})) &= \boldsymbol{D}\boldsymbol{D}^\top \boldsymbol{x} \end{align}\]

```
def cov(X):
"""Computes the covariance of the input
The covariance matrix gives some sense of how much two values are
linearly related to each other, as well as the scale of these variables.
It is computed by (1 / (N - 1)) * (X - E[X]).T (X - E[X]).
Args:
X: A torch tensor as input.
"""
-= X.mean(dim=0, keepdim=True)
X = 1.0 / (X.shape[0] - 1)
fact = fact * X.T.mm(X)
cov return cov
def pca(X, target_dim=2):
"""Computes the n^th first principal components of the input
PCA can be implemented using the n^th principal components of the covariance matrix.
We could have been using an eigen decomposition because the covariance matrix is always squared
but singular value decomposition does also the trick if we take the right singular vectors
and perform a matrix multiplication to the right.
Args:
X: A torch tensor as the input.
target_dim: An integer for selecting the n^th first components.
"""
= cov(X)
cov_x
= torch.svd(cov_x)
U, S, V = V[:, :target_dim]
transform_mat = X.mm(transform_mat)
X_reduced return X_reduced, transform_mat
```

Let’s project our data in 2D with PCA:

```
= pca(X_test, target_dim=2)
X_test_pca, _ = pca(X_train, target_dim=2)
X_train_pca, _
= torch.cat([X_test_pca[:3], Y_pred_test[:3]], axis=1)
points = points[2, :] - points[0, :]
v1 = points[1, :] - points[0, :]
v2 = torch.cross(v1, v2)
cp = cp
a, b, c = cp.dot(points[2, :])
d
= min(X_test_pca[:, 0].min(), X_train_pca[:, 0].min())
min_mesh_x = max(X_test_pca[:, 0].max(), X_train_pca[:, 0].max())
max_mesh_x = min(X_test_pca[:, 1].min(), X_train_pca[:, 1].min())
min_mesh_y = max(X_test_pca[:, 1].max(), X_train_pca[:, 1].max())
max_mesh_y
= np.linspace(min_mesh_x, max_mesh_x, 25)
mesh_x = np.linspace(min_mesh_y, max_mesh_y, 25)
mesh_y = np.meshgrid(mesh_x, mesh_y)
mesh_xx, mesh_yy
= (d - a * mesh_xx - b * mesh_yy) / c mesh_zz
```

Then, we recreate the prediction plane using three random points of the prediction. More information here.

```
= plt.figure(figsize=(25,7))
fig = fig.add_subplot(131, projection='3d')
ax1 = fig.add_subplot(132, projection='3d')
ax2 = fig.add_subplot(133, projection='3d')
ax3 = [ax1, ax2, ax3]
axes
for ax in axes:
0], X_test_pca[:, 1], Y_test, color='red', edgecolor='black')
ax.scatter(X_test_pca[:, 0], X_train_pca[:, 1], Y_train, color='green', edgecolor='black')
ax.scatter(X_train_pca[:, =(0, 0, 0, 0), s=20, edgecolor='#70b3f0')
ax.scatter(mesh_xx.flatten(), mesh_yy.flatten(), mesh_zz.flatten(), facecolor
'1st component', fontsize=12)
ax.set_xlabel('2nd component', fontsize=12)
ax.set_ylabel('Selling Price', fontsize=12)
ax.set_zlabel(="x", style="sci", scilimits=(0,0))
ax.ticklabel_format(axis="y", style="sci", scilimits=(0,0))
ax.ticklabel_format(axis="z", style="sci", scilimits=(0,0))
ax.ticklabel_format(axis=60, azim=50)
ax1.view_init(elev=10, azim=0)
ax2.view_init(elev=-15, azim=140) ax3.view_init(elev
```

Good news ! The prediction plane is fitting pretty well the data 😎

# Conclusion

So, here’s the bottom line: EDA helped us understand some trends in our data and we managed to build a pretty decent linear regression model !

It turns out that linear regression can work like magic if your data has a lot of linear correlation.

Remember to add a linear model to your baseline for relatively regression tasks in order to compare it to more sophisticated models. Happy data exploring! 🚀📊

Still eager to learn ? You can jump right in the next article of this series: **Logistic Regression From Scratch With PyTorch**.