Link: Supervised learning
What is classification
Classification: the prediction is either correct, or incorrect.
Binary classification
Only two available classes
- E.g. dog/cat picture
In supervised learning, we fit/train the model on training data, and test the model on testing data by comparing the predictions to the true y values
Incorrect/correct are not the same
It often doesn’t tell a complete story. What’s why we need confusion matrix
Confusion matrix
The bottom line is that these are all fundamental methods of comparing predicted values vs true values.
Accuracy
Usage
- Useful when the labels are well balanced.
- E.g. same amount of dog/cat pictures
- Not a good choice if it’s unbalanced (e.g. majority is classA while only a small portion is classB). That’s why we need Recall and Precision
Recall
The ability to find all the relevant cases
Precision
The ability to identify only the relevant cases
F1-score
Combine recall and precision to balance these two metrics. It’s the harmonic mean of precision and recall.
Why harmonic mean?
Because harmonic mean will punish the extreme differences while simple average/mean doesn’t. It gives a more fair assessment between precision and recall. E.g. precision=1, recall=0, average=0.5 while F1=0