Lesson 19: Random Forests — Wisdom of the Crowd

Random Forests: Wisdom of the Crowd

If one decision tree is prone to overfitting, what happens if we build 100 trees and let them vote? We get a Random Forest!

Ensemble Learning

This concept is called Ensemble Learning. It's like asking 100 people to guess the number of jellybeans in a jar. Individual guesses might be way off, but the average of the crowd is usually surprisingly accurate.

To ensure the trees are actually diverse (not just 100 identical trees), a Random Forest uses a technique called Bagging. Each tree is trained on a random subset of the data, and looks at a random subset of features at each split.

Python Challenge: Grow a Forest

Train a Random Forest and check its feature importance.

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)

# TODO: Initialize RandomForestClassifier with n_estimators=50 (50 trees)
# forest = ???

# TODO: Fit the model
# forest.???

# Print which features the forest found most useful!
# print(f"Feature Importances: {forest.feature_importances_}")

Random forests are incredibly powerful and often perform very well right out of the box without much tuning. The tradeoff? They are harder to interpret than a single decision tree.