#   # Why do all Machine Learning models follow the same steps?

## Introduction

It's tough to find things that always work the same way in programming.

The steps of a Machine Learning (ML) model can be an exception.

Each time we want to compute a model (mathematical equation) and make predictions with it, we would always make the following steps:

1. `model.fit()` → to compute the numbers of the mathematical equation..
2. `model.predict()` → to calculate predictions through the mathematical equation.
3. `model.score()` → to measure how good the model's predictions are.

And I am going to show you this with 3 different ML models.

• `DecisionTreeClassifier()`
• `RandomForestClassifier()`
• `LogisticRegression()`

But first, let's load a dataset from CIS executing the lines of code below:

• The goal of this dataset is
• To predict `internet_usage` of people (rows)
• Based on their socio-demographical characteristics (columns)
``````import pandas as pd

``````
internet_usage sex age education
0 0 Female 66 Elementary
1 1 Male 72 Elementary
2 1 Male 48 University
3 0 Male 59 PhD
4 1 Female 44 PhD

## Data Preprocessing

We need to transform the categorical variables to dummy variables before computing the models:

``````df = pd.get_dummies(df, drop_first=True)
`````` ## Feature Selection

Now we separate the variables on their respective role within the model:

``````target = df.internet_usage
explanatory = df.drop(columns='internet_usage')
``````

## ML Models

### Decision Tree Classifier

``````from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X=explanatory, y=target)

pred_dt = model.predict(X=explanatory)
accuracy_dt = model.score(X=explanatory, y=target)
``````

### Support Vector Machines

``````from sklearn.svm import SVC

model = SVC()
model.fit(X=explanatory, y=target)

pred_sv = model.predict(X=explanatory)
accuracy_sv = model.score(X=explanatory, y=target)
``````

### K Nearest Neighbour

``````from sklearn.neighbors import KNeighborsClassifier

model = KNeighborsClassifier()
model.fit(X=explanatory, y=target)

pred_kn = model.predict(X=explanatory)
accuracy_kn = model.score(X=explanatory, y=target)
``````

The only thing that changes are the results of the prediction. The models are different. But they all follow the same steps that we described at the beginning:

1. `model.fit()` → to compute the mathematical formula of the model
2. `model.predict()` → to calculate predictions through the mathematical formula
3. `model.score()` → to get the success ratio of the model

## Comparing Predictions

You may observe in the following table how the different models make different predictions, which often doesn't coincide with reality (misclassification).

For example, `model_svm` doesn't correctly predict the row 214; as if this person used internet `pred_svm=1`, but they didn't: `internet_usage` for 214 in reality is 0.

``````df_pred = pd.DataFrame({'internet_usage': df.internet_usage,
'pred_dt': pred_dt,
'pred_svm': pred_sv,
'pred_lr': pred_kn})

df_pred.sample(10, random_state=7)
``````
internet_usage pred_dt pred_svm pred_lr
214 0 0 1 0
2142 1 1 1 1
1680 1 0 0 0
1522 1 1 1 1
325 1 1 1 1
2283 1 1 1 1
1263 0 0 0 0
993 0 0 0 0
26 1 1 1 1
2190 0 0 0 0

## Choose Best Model

Then, we could choose the model with a higher number of successes on predicting the reality.

``````df_accuracy = pd.DataFrame({'accuracy': [accuracy_dt, accuracy_sv, accuracy_kn]},
index = ['DecisionTreeClassifier()', 'SVC()', 'KNeighborsClassifier()'])

df_accuracy
``````
accuracy
DecisionTreeClassifier() 0.859878
SVC() 0.783707
KNeighborsClassifier() 0.827291

Which is the best model here?

• Let me know in the comments below ↓