Lesson 4: Build a Production Pipeline

In this final MLOps project workshop, you will synthesize everything you've learned. You will construct a complete, production-grade, containerized computer vision inference server capable of processing image payloads, performing tensor normalization, tracking sub-millisecond latency metrics, and running in an optimized Docker sandbox!

Project Architecture Overview

Your application will ingest image payloads as Base64 encoded strings, preprocess them to normalized tensor dimensions, execute mock classification inferences, and respond with filtered predictions and performance telemetry:

Inference Server Data Flow:

1. Client --[POST json with base64 string]--> /predict

2. FastAPI/Pydantic parses and validates payload formats

3. Base64 Decoder decodes raw string bytes to image array

4. Preprocessor normalizes channels: z = (x/255.0 - mean) / std

5. Model runs forward pass & measures execution time (latency)

6. Server --[Response json with classes & latency]--> Client

Image Preprocessing Mathematics

For standard deep learning classification models (like ResNet or MobileNet), inputs must be scaled and normalized to match the training distribution. Given an RGB pixel value $x \in [0, 255]$, the normalized pixel value $z$ is formulated as:

z = ((x / 255.0) - mean) / std

Where we utilize the ImageNet distribution standards:

mean = [0.485, 0.456, 0.406] (Red, Green, Blue channels)
std = [0.229, 0.224, 0.225]

Production Template: Base64 Decodes

APIs typically transmit binary media (like images) using Base64 encoding inside JSON payloads. In Python, you can decode and transform these payloads cleanly:

import base64
from io import BytesIO
from PIL import Image

def decode_image(base64_string: str) -> Image.Image:
    # Decode string, parsing raw bytes buffer
    image_bytes = base64.b64decode(base64_string)
    image = Image.open(BytesIO(image_bytes))
    return image.convert("RGB").resize((224, 224))

Project Tasks: Build a Production Pipeline

Implement the following end-to-end components in the workspace to construct your production classifier pipeline:

[ ]Task 1: Define a decode_and_preprocess(base64_str: str) function. It must decode base64 strings into a resized $224 \times 224$ RGB image and normalize pixel coordinates based on the standard ImageNet formula.
[ ]Task 2: Define a Pydantic schema PayloadRequest that requires image_base64 (string) and a confidence_threshold (float, default=0.5).
[ ]Task 3: Define a Pydantic schema PredictionItem containing label and confidence, and a ClassificationResponse containing prediction arrays and latency_ms.
[ ]Task 4: Initialize a FastAPI application with a lifespan manager caching mock model classes: ["Mammal", "Bird", "Reptile", "Amphibian"].
[ ]Task 5: Implement a POST endpoint /predict. Use time.perf_counter() to track execution latency around the decoding, preprocessing, and prediction segments.
[ ]Task 6: Inside /predict, simulate classifier confidence calculations. Filter any outputs scoring below the request's confidence_threshold, and return a ClassificationResponse.
[ ]Task 7: Create a complete accompanying production Dockerfile using the python:3.11-slim image base, optimized pip library caching layers, and launching the application.

Graduation Challenge: Once all tasks are complete, you will have built a fully containerized ML inference service ready for massive web scale. Run docker build locally to verify your pipeline!

import base64 from io import BytesIO from PIL import Image def decode_image(base64_string: str) -> Image.Image: # Decode string, parsing raw bytes buffer image_bytes = base64.b64decode(base64_string) image = Image.open(BytesIO(image_bytes)) return image.convert("RGB").resize((224, 224))