Evaluating Classification Models
Accuracy (the percentage of correct predictions) seems like a great metric, but it can be misleading. If 99% of emails are NOT spam, a model that just says "Not Spam" every single time gets 99% accuracy! But it's completely useless.
To get the full picture, we use a Confusion Matrix, which breaks down predictions into four categories:
From this matrix, we calculate Precision (minimizes false alarms), Recall (minimizes missed detections), and the F1 Score (a balance of both).
Calculate accuracy, precision, and recall using scikit-learn for an imbalanced dataset.
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
# Imbalanced data: 0 is Normal, 1 is Fraud
y_actual = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
# A naive model that always predicts 0
y_pred = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# TODO: Calculate Accuracy, Precision, and Recall
# acc = ???
# prec = ???
# rec = ???
# print(f"Accuracy: {acc}") # Wow, 90%!
# print(f"Recall: {rec}") # Oh no, 0%! We missed the fraud!