Real-world data is messy. It's often in CSV files or SQL databases. Pandas is the ultimate tool for wrangling structured data.
Think of Pandas as Excel or Google Sheets, but running on Python code. We usually import it as pd.
The core of Pandas is the DataFrame — a 2D table of data with rows and columns.
Before training a model, you must understand your data.
df.shape - Returns (rows, columns)df.info() - Column names, data types, and missing valuesdf.describe() - Summary statistics (mean, min, max) for numeric columnsYou can grab specific columns (Features) or filter rows based on conditions.
Imagine we have a Pandas DataFrame called df loaded with passenger data from the Titanic. Your job is to prep the features and labels!
features by selecting only the columns: ["Pclass", "Age", "Fare"]. (Hint: pass a list of strings inside the brackets).labels by selecting just the "Survived" column.features["Age"].fillna(30) to replace missing ages with 30, and assign the result back to features["Age"].features.head() to verify your work.