Lesson 2: Containerization with Docker

In machine learning, the classic deployment obstacle "It works on my machine" is exacerbated by complex CUDA GPU drivers, conflicting linear algebra binaries, and intricate Python packages. Docker containerization isolates the entire runtime environment, ensuring deterministic deployments from laptop to production cloud cluster.

Docker Image

A static, read-only blueprint containing the complete operating system files, system libraries, configuration, Python interpreter, model binaries, and code. Built incrementally via a sequential list of directives.

Docker Container

A live, isolated, lightweight execution sandbox run from a specific Docker Image. It shares the host machine's kernel while running its own user-space processes securely.

Anatomy of an ML Dockerfile

A Dockerfile is a text manifest containing instructions to construct a Docker image. For machine learning pipelines, writing a Dockerfile requires careful attention to base image selection and caching strategies.

# 1. Base Image: Choose slim versions or CUDA containers for GPU
FROM python:3.10-slim

# 2. Environment Variables: Optimize Python behavior inside containers
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

# 3. Work Directory: Define the runtime path
WORKDIR /app

# 4. OS Dependencies: Install compilers needed for C++ extension builds
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

Leveraging Docker Layer Caching: Docker processes Dockerfiles step-by-step. If a step's source files have not changed, Docker skips execution and uses a cached version. By copying requirements.txt and running pip install before copying the rest of the application source code, we ensure that subsequent source code edits do not trigger a slow rebuild of external Python libraries (like Torch or NumPy).

Layer Cache-Friendly Structure

# Copy ONLY requirements first
COPY requirements.txt .

# Execute PIP cache mount to keep layers clean
RUN pip install --no-cache-dir --upgrade -r requirements.txt

# Copy the rest of the codebase (this changes frequently)
COPY . .

# Expose port and run server
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

CPU vs. GPU Containerization

When building deep learning containers, using a pure Python base image means code will run only on the CPU. To execute models on specialized AI accelerators (GPUs), you must base your images on NVIDIA's CUDA runtime environments:

CPU Base: python:3.11-slim. Ultra-lightweight (~150MB), ideal for tabular models, classical machine learning, or processing non-tensor data.
GPU/CUDA Base: nvidia/cuda:12.1.0-runtime-ubuntu22.04. Very large (several GBs), packaged with the preconfigured CUDA drivers, NVML, and cuDNN libraries necessary to run PyTorch, JAX, or TensorFlow on physical GPUs.

Challenge Tasks: Construct a Dockerfile

Write a production Dockerfile optimized for an inference server. Ensure maximum layer cache optimization and minimal final image footprint:

[ ]Task 1: Set the base container image to python:3.11-slim to keep the image lightweight.
[ ]Task 2: Define environment variables to stop Python from generating .pyc cache files and to force real-time standard output streaming.
[ ]Task 3: Declare the internal operational workspace directory inside the container as /var/ml_service.
[ ]Task 4: Copy the dependency file requirements.txt into the working directory prior to copying the application logic.
[ ]Task 5: Install Python packages cleanly using the --no-cache-dir flag to prevent local cache duplication.
[ ]Task 6: Transfer the remaining repository contents into the container workspace and expose network communication port 8000.
[ ]Task 7: Write the final CMD block to start the server running on port 8000 bound to all host interfaces (0.0.0.0).

# 1. Base Image: Choose slim versions or CUDA containers for GPU FROM python:3.10-slim # 2. Environment Variables: Optimize Python behavior inside containers ENV PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 # 3. Work Directory: Define the runtime path WORKDIR /app # 4. OS Dependencies: Install compilers needed for C++ extension builds RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ && rm -rf /var/lib/apt/lists/*

# Copy ONLY requirements first COPY requirements.txt . # Execute PIP cache mount to keep layers clean RUN pip install --no-cache-dir --upgrade -r requirements.txt # Copy the rest of the codebase (this changes frequently) COPY . . # Expose port and run server EXPOSE 8000 CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]