In this final MLOps project workshop, you will synthesize everything you've learned. You will construct a complete, production-grade, containerized computer vision inference server capable of processing image payloads, performing tensor normalization, tracking sub-millisecond latency metrics, and running in an optimized Docker sandbox!
Your application will ingest image payloads as Base64 encoded strings, preprocess them to normalized tensor dimensions, execute mock classification inferences, and respond with filtered predictions and performance telemetry:
For standard deep learning classification models (like ResNet or MobileNet), inputs must be scaled and normalized to match the training distribution. Given an RGB pixel value $x \in [0, 255]$, the normalized pixel value $z$ is formulated as:
Where we utilize the ImageNet distribution standards:
mean = [0.485, 0.456, 0.406] (Red, Green, Blue channels)std = [0.229, 0.224, 0.225]APIs typically transmit binary media (like images) using Base64 encoding inside JSON payloads. In Python, you can decode and transform these payloads cleanly:
import base64
from io import BytesIO
from PIL import Image
def decode_image(base64_string: str) -> Image.Image:
# Decode string, parsing raw bytes buffer
image_bytes = base64.b64decode(base64_string)
image = Image.open(BytesIO(image_bytes))
return image.convert("RGB").resize((224, 224))
Implement the following end-to-end components in the workspace to construct your production classifier pipeline:
decode_and_preprocess(base64_str: str) function. It must decode base64 strings into a resized $224 \times 224$ RGB image and normalize pixel coordinates based on the standard ImageNet formula.PayloadRequest that requires image_base64 (string) and a confidence_threshold (float, default=0.5).PredictionItem containing label and confidence, and a ClassificationResponse containing prediction arrays and latency_ms.lifespan manager caching mock model classes: ["Mammal", "Bird", "Reptile", "Amphibian"]./predict. Use time.perf_counter() to track execution latency around the decoding, preprocessing, and prediction segments./predict, simulate classifier confidence calculations. Filter any outputs scoring below the request's confidence_threshold, and return a ClassificationResponse.Dockerfile using the python:3.11-slim image base, optimized pip library caching layers, and launching the application.