Training your model with the dataset

To create a model, you need a dataset. And to train the model, you need data. So you split the given dataset into

  1. Training data
  2. Test data

The % of split can be defined by you but usually it’s around 80-20 split. You will always work on the training data and create the model and then evaluate your model against the unseen data, that is your test data.

There are python modules and functions that can assist in this. The main module that you will leverage is Scikit learn

# Import the module
from sklearn.model_selection import train_test_split

# Split the data into training and test set
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42, stratify=y)

The above command says that you split the data into 20% test and remaining 80% into training data. The Stratify is to say you have the equal number of labels. The split is assigned to X_train and y_train from the training set and the test data is assigned to X_test and y_test.

After the split is done, you will then train the model using a function called .fit and then you validate the model against the test data using .predict

By training the model, you try to find the patterns that map the right inputs to the correct outputs.

Once the data is validated against the test set, we will have to figure out how accurate the model is and you can get the accuracy score using the .score function

Leave a Reply

Your email address will not be published. Required fields are marked *