Its all about training various models on labeled data to make predictions. Goal of supervised learning is to make accurate predictions on unseen data.
There are 2 types of supervised learning
- Classification – categorical variable
- Regression – continuous variable
Few basics terminologies:
- Features
- Labels
- Accuracy
Features are nothing but inputs. Sometimes they are also known as independent variable. In machine learning its always represented as X
Labels are the target variable thats also known as dependent variable or response variable. In machine learning its always represented as y. And y is a function of X. So y is like an output for the function X which is the input.
y=f(X)
Accuracy is the correct prediction towards the total observation. Classification always depends on accuracy.
Accuracy = Correct prediction / Total Observation
There are few requirements before performing the training.
- The dataset should not have any missing values (recommended)
- The data should be in numeric format
- Data should be in a dataframe or as an array
So once you have a dataset, you need to do the preliminary activities like data cleaning and missing value treatments before proceeding with training the model
To solve the Classification problem there are few steps involved.
- Get the dataset
- Split the dataset (Train/Test)
- Fit the dataset (train the model)
- Predict the model (test the model output again your actual test data)
- Score the model (how accurate is your model)