Zero2AI
CoursesPlayground
Start Learning
AI Fundamentals: From Zero to Your First Model • Module D: The Final Project & BeyondLesson 26: Project Workshop — Problem Definition & Data
PreviousNext

Lesson 26: Project Workshop — Problem Definition & Data

Define the final project problem; load and thoroughly explore the dataset.

Project Workshop: Define the Problem and Explore the Data

It's time for the capstone project! You will build a complete sentiment analysis system that classifies movie reviews as positive or negative. We'll put together everything you've learned: data exploration, preprocessing, feature engineering, model selection, evaluation, and interpretation.

Step 1: The Problem Statement

We are dealing with a binary text classification problem. Given the text of a movie review, our goal is to predict the label: Positive or Negative.

Step 2: Explore the Dataset

We have a dataset of 2,000 movie reviews (1,000 positive and 1,000 negative). Before jumping into modeling, you must understand your data.

Coding Challenge: Data Exploration

Use pandas to explore the dataset and answer these questions:

  1. Load the dataset into a pandas DataFrame.
  2. Check the class balance using df['sentiment'].value_counts(). Are the classes perfectly balanced?
  3. Examine 5 sample reviews from each class.
  4. Write a short script to calculate the average review length (in words) for positive vs. negative reviews. Is there a noticeable difference?

Built with AI for beginners. Open source and free forever.