Lesson 17: Evaluating Classification Models

Evaluating Classification Models

Accuracy (the percentage of correct predictions) seems like a great metric, but it can be misleading. If 99% of emails are NOT spam, a model that just says "Not Spam" every single time gets 99% accuracy! But it's completely useless.

The Confusion Matrix

To get the full picture, we use a Confusion Matrix, which breaks down predictions into four categories:

True Positives (TP): We predicted positive, and it was positive (Caught the spam!)
True Negatives (TN): We predicted negative, and it was negative.
False Positives (FP): We predicted positive, but it was negative (False alarm!).
False Negatives (FN): We predicted negative, but it was positive (Missed the spam!).

From this matrix, we calculate Precision (minimizes false alarms), Recall (minimizes missed detections), and the F1 Score (a balance of both).

Python Challenge: Avoid the Trap

Calculate accuracy, precision, and recall using scikit-learn for an imbalanced dataset.

from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

# Imbalanced data: 0 is Normal, 1 is Fraud
y_actual = [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
# A naive model that always predicts 0
y_pred =   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

# TODO: Calculate Accuracy, Precision, and Recall
# acc = ???
# prec = ???
# rec = ???

# print(f"Accuracy: {acc}") # Wow, 90%!
# print(f"Recall: {rec}")   # Oh no, 0%! We missed the fraud!