For most common problems that are solved using machine learning, there are usually multiple models available. Each one has its own quirks and would perform differently based on various factors.
Each model is judged by its performance over a dataset, usually called the “validation/test” dataset. This performance is measured using various statistics — accuracy, precision, recall etc. The statistic of choice is usually specific to your particular application and use case. And for each application, it is critical to find a metric that can be used to objectively compare models.
In this article, we will be talking about the most…
This is aimed to be a short primer for anyone who needs to know the difference between the various dataset splits while training Machine Learning models.
For this article, I would quote the base definitions from Jason Brownlee’s excellent article on the same topic, it is quite comprehensive, do check it out for more details.
Training Dataset: The sample of data used to fit the model.
The actual dataset that we use to train the model (weights and biases in the case of a Neural Network). The model sees and learns from this data.
Validation Dataset: The sample of data…
In this article, I want to go into really basic explanations of what Precision and Recall mean, I will refrain from getting into using the words True Positives, False Positives, True Negatives etc.; because although they are technically correct terms, they tend to get a bit overwhelming for the beginner.
Both precision and recall are metrics that are used to analyse a predictive model. We generally calculate these statistics over a validation or test dataset.
Recall is the Ratio of the correct predictions and the total number of correct items in the set. It is expressed as % of the…