Feature Engineering: Making Data Model-Ready
Machine learning models only understand numbers. If you have a column with values like "Red", "Blue", and "Green", or a column measuring age in years alongside a column measuring income in thousands of dollars, the model will struggle.
"Better features beat better algorithms."
Use scikit-learn's StandardScaler to scale some numerical data.
from sklearn.preprocessing import StandardScaler
import numpy as np
# Age (years) and Income ($)
X = np.array([[25, 40000],
[45, 85000],
[30, 50000]])
# TODO: Initialize StandardScaler
# scaler = ???
# TODO: Fit the scaler to X and transform X
# X_scaled = ???
# print(X_scaled)
# Notice how the values are now centered around 0!