# How to Analyze Data through Visualization

## This tutorial will help you understand how Data Visualization can help to analyse patterns that may result in a Linear Regression model

What is a plot?

- A visual representation of the data

Which data? How is it usually structured?

- In a table. For example:

```
import seaborn as sns
df = sns.load_dataset('mpg', index_col='name')
df.head()
```

How can you Visualice this `DataFrame`

?

- We could make a point for every car based on
- weight
- mpg

```
sns.scatterplot(x='weight', y='mpg', data=df);
```

Which conclusions can you make out of this plot?

Well, you may observe that the location of the points are descending as we move to the right

This means that the

`weight`

of the car may produce a lower capacity to make kilometres`mpg`

How can you measure this relationship?

- Linear Regression

```
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X=df[['weight']], y=df.mpg)
model.__dict__
```

- Resulting in ↓

```
{'fit_intercept': True,
'normalize': False,
'copy_X': True,
'n_jobs': None,
'n_features_in_': 1,
'coef_': array([-0.00767661]),
'_residues': 7474.8140143821,
'rank_': 1,
'singular_': array([16873.20281508]),
'intercept_': 46.31736442026565}
```

Which is the mathematical formula for this relationship?

$$mpg = 46.31 - 0.00767 \cdot weight$$

- This equation means that the
`mpg`

gets 0.00767 units lower for**every unit**that`weight`

**increases**.

Could you visualise this equation in a plot?

- Absolutely, we could make the predictions from the original data and plot them.

## Predictions

```
y_pred = model.predict(X=df[['weight']])
dfsel = df[['weight', 'mpg']].copy()
dfsel['prediction'] = y_pred
dfsel.head()
```

weight | mpg | prediction | |
---|---|---|---|

name | |||

chevrolet chevelle malibu | 3504 | 18.0 | 19.418523 |

buick skylark 320 | 3693 | 15.0 | 17.967643 |

plymouth satellite | 3436 | 18.0 | 19.940532 |

amc rebel sst | 3433 | 16.0 | 19.963562 |

ford torino | 3449 | 17.0 | 19.840736 |

Out of this table, you could observe that predictions don't exactly match the reality, but it approximates.

For example, Ford Torino's

`mpg`

is 17.0, but our model predicts 19.84.

## Model Visualization

```
sns.scatterplot(x='weight', y='mpg', data=dfsel)
sns.scatterplot(x='weight', y='prediction', data=dfsel);
```

- The blue points represent the actual data.
- The orange points represent the predictions of the model.

I teach Python, R, Statistics & Data Science. I like to produce content that helps people to understand these topics better.

Feel free and welcomed to give me feedback as I would like to make my tutorials clearer and generate content that interests you 🤗

You can see my Tutor Profile here if you need Private Tutoring lessons.