How to Analyze Data through Visualization

This tutorial will help you understand how Data Visualization can help to analyse patterns that may result in a Linear Regression model

How to Analyze Data through Visualization

What is a plot?

  • A visual representation of the data

Which data? How is it usually structured?

  • In a table. For example:
import seaborn as sns

df = sns.load_dataset('mpg', index_col='name')
df.head()

head.png

How can you Visualice this DataFrame?

  • We could make a point for every car based on
    1. weight
    2. mpg
sns.scatterplot(x='weight', y='mpg', data=df);

plot1.jpeg

Which conclusions can you make out of this plot?

  • Well, you may observe that the location of the points are descending as we move to the right

  • This means that the weight of the car may produce a lower capacity to make kilometres mpg

How can you measure this relationship?

  • Linear Regression
from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X=df[['weight']], y=df.mpg)
model.__dict__
  • Resulting in ↓
{'fit_intercept': True,
 'normalize': False,
 'copy_X': True,
 'n_jobs': None,
 'n_features_in_': 1,
 'coef_': array([-0.00767661]),
 '_residues': 7474.8140143821,
 'rank_': 1,
 'singular_': array([16873.20281508]),
 'intercept_': 46.31736442026565}

Which is the mathematical formula for this relationship?

$$mpg = 46.31 - 0.00767 \cdot weight$$

  • This equation means that the mpg gets 0.00767 units lower for every unit that weight increases.

Could you visualise this equation in a plot?

  • Absolutely, we could make the predictions from the original data and plot them.

Predictions

y_pred = model.predict(X=df[['weight']])

dfsel = df[['weight', 'mpg']].copy()
dfsel['prediction'] = y_pred

dfsel.head()
weight mpg prediction
name
chevrolet chevelle malibu 3504 18.0 19.418523
buick skylark 320 3693 15.0 17.967643
plymouth satellite 3436 18.0 19.940532
amc rebel sst 3433 16.0 19.963562
ford torino 3449 17.0 19.840736
  • Out of this table, you could observe that predictions don't exactly match the reality, but it approximates.

  • For example, Ford Torino's mpg is 17.0, but our model predicts 19.84.

Model Visualization

sns.scatterplot(x='weight', y='mpg', data=dfsel)
sns.scatterplot(x='weight', y='prediction', data=dfsel);

plot2.jpeg

  1. The blue points represent the actual data.
  2. The orange points represent the predictions of the model.

I teach Python, R, Statistics & Data Science. I like to produce content that helps people to understand these topics better.

Feel free and welcomed to give me feedback as I would like to make my tutorials clearer and generate content that interests you 🤗

You can see my Tutor Profile here if you need Private Tutoring lessons.

Did you find this article valuable?

Support Resolving Python by becoming a sponsor. Any amount is appreciated!