Lesson 19: Random Forests — Wisdom of the Crowd
Explain ensemble learning; tune n_estimators; compare performance.
Random Forests: Wisdom of the Crowd
If one decision tree is prone to overfitting, what happens if we build 100 trees and let them vote? We get a Random Forest!
Ensemble Learning
This concept is called Ensemble Learning. It's like asking 100 people to guess the number of jellybeans in a jar. Individual guesses might be way off, but the average of the crowd is usually surprisingly accurate.
To ensure the trees are actually diverse (not just 100 identical trees), a Random Forest uses a technique called Bagging. Each tree is trained on a random subset of the data, and looks at a random subset of features at each split.
Python Challenge: Grow a Forest
Train a Random Forest and check its feature importance.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
# TODO: Initialize RandomForestClassifier with n_estimators=50 (50 trees)
# forest = ???
# TODO: Fit the model
# forest.???
# Print which features the forest found most useful!
# print(f"Feature Importances: {forest.feature_importances_}")Random forests are incredibly powerful and often perform very well right out of the box without much tuning. The tradeoff? They are harder to interpret than a single decision tree.