P4 — Infrastructure Basics: Containers, Cloud, and Git

Learning objective

Build a Docker image for a Python AI service, push it to a container registry, explain the three-tier cloud infrastructure model, and manage your codebase through a branching Git workflow.

Why this matters

Production AI services do not run on developer laptops. They run in containers on managed cloud infrastructure. When you specify a deployment architecture — "deploy the triage agent as a containerised service behind an API gateway" — you need to understand what that actually means operationally. Containers are the standard unit of deployment for AI services. Docker is how you build them. Git branching is how teams coordinate changes without overwriting each other's work. Infrastructure-as-code is how environments stay consistent between development and production.

Pre-work

Docker official Getting Started guide at docs.docker.com/get-started — Parts 1–4 (~1 hr)
Install Docker Desktop from docker.com/products/docker-desktop
freeCodeCamp: Git and GitHub for Beginners (YouTube, ~1 hr)
Read: AWS Free Tier overview at aws.amazon.com/free — understand what is free and for how long before creating resources

Core concept explained

Containers vs virtual machines

A virtual machine runs a complete operating system on virtualised hardware. It starts in minutes and consumes gigabytes of memory even when idle. A container shares the host operating system's kernel and runs only the application layer — start time is seconds, and a Python service container typically uses 100–300MB of memory. For AI services that need to scale to zero (pay nothing when not in use) and scale up rapidly (handle traffic spikes), containers are the default deployment unit.

Docker is the dominant container platform. A Dockerfile specifies the image — the filesystem snapshot that your container starts from. The image is built once and run many times, in any environment that has Docker installed, with identical behaviour. This is how you eliminate "it works on my laptop" as an incident category.

Writing a production-grade Dockerfile for a Python AI service

# Multi-stage build: build stage installs dependencies, runtime stage runs the app
FROM python:3.14-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt --target=/build/deps

FROM python:3.14-slim AS runtime
WORKDIR /app

# Security: run as non-root user
RUN useradd --create-home --shell /bin/bash appuser

# Copy only the installed packages and application code
COPY --from=builder /build/deps /usr/local/lib/python3.14/site-packages
COPY --chown=appuser:appuser . .

USER appuser

# Environment variables with safe defaults (secrets injected at runtime, never in image)
ENV PYTHONUNBUFFERED=1
ENV LOG_LEVEL=INFO

# Health check — cloud platforms use this to decide if the container is ready
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1

EXPOSE 8000
CMD ["python", "main.py"]

Common mistake: Copying .env files or API keys into the Docker image. Every COPY instruction is visible in the image layer history and in the container registry. Secrets must be injected at runtime via environment variables, not baked into the image. Verify your .dockerignore file excludes .env before every build.

Git branching for professional development

The main branch is always deployable. New work happens on feature branches. The workflow is: create branch (git checkout -b feature/add-logging) → commit changes → push branch (git push origin feature/add-logging) → open pull request → merge to main after review. This pattern protects production from unreviewed changes and is the workflow you will use across all curriculum repositories.

The three-tier cloud infrastructure model

Cloud infrastructure is organised in three tiers: the region (a geographic area containing multiple data centres, e.g., us-east-1), the availability zone (an independent data centre within the region, e.g., us-east-1a), and the service instance (the compute resource running your code). Enterprise AI systems are designed to survive the failure of any single availability zone — this is where "high availability" comes from. Understanding this model lets you evaluate whether an architecture specification is cloud-native or just cloud-hosted.

Pydantic — validated data models (new: added in V6 to close a curriculum gap)

Why this is here. The main.py service you are about to build uses pydantic.BaseModel to define CompletionRequest and CompletionResponse. Nothing you have typed so far — not Week P1's plain dictionaries, not Week P2's @property/inheritance — explains this syntax. This is the one place in the prerequisite track where a genuinely new library is required cold. Read this section before Step 1.

What Pydantic is, in one sentence. Pydantic turns a class definition into a data contract: you declare the fields and their types once, and every time an object of that class is created, Pydantic checks the incoming data against those types and rejects it immediately if it doesn't match — instead of letting a bad value travel silently into your business logic and fail somewhere else, confusingly, later.

Why FastAPI needs this. Your /complete endpoint receives a JSON body from whatever called it — a browser, a curl command, another service. That JSON could contain anything: a missing field, a string where a number was expected, an extra field that doesn't belong. CompletionRequest(BaseModel) is the checkpoint: FastAPI hands the raw JSON to Pydantic, Pydantic either builds a valid CompletionRequest object or raises a 422 Unprocessable Entity before your complete() function ever runs. You get input validation for free, at the API boundary, without writing a single if isinstance(...) check yourself.

The core mechanic — compare this to what you already know:

# What you already know how to write (Week P2 pattern) — manual validation
class CompletionRequestManual:
    def __init__(self, system_prompt: str, user_message: str, max_tokens: int = 1024):
        if not isinstance(system_prompt, str):
            raise TypeError(f"system_prompt must be a string, got {type(system_prompt).__name__}")
        if not isinstance(user_message, str):
            raise TypeError(f"user_message must be a string, got {type(user_message).__name__}")
        if not isinstance(max_tokens, int):
            raise TypeError(f"max_tokens must be an int, got {type(max_tokens).__name__}")
        self.system_prompt = system_prompt
        self.user_message = user_message
        self.max_tokens = max_tokens

# What Pydantic generates for you from the same declaration
from pydantic import BaseModel

class CompletionRequest(BaseModel):
    system_prompt: str
    user_message: str
    max_tokens: int = 1024

Both classes do the same job. CompletionRequestManual is the __init__-with-isinstance-checks pattern you already used in Week P2's build_message(). CompletionRequest is the same contract, but Pydantic writes the __init__, the type checks, and the error messages for you, purely from the type annotations on each line. There is no self.x = x because Pydantic's BaseModel generates that assignment automatically from the annotation — this is the one piece of "magic" in this pattern, and it is the only one: a type-annotated class attribute on a BaseModel subclass becomes a validated constructor parameter and an instance attribute, with no __init__ written by you.

What happens on bad input:

CompletionRequest(system_prompt="Be helpful", user_message="Hi", max_tokens="a lot")
# Raises pydantic.ValidationError:
#   max_tokens
#     Input should be a valid integer, unable to parse string as an integer
#     [type=int_parsing, input_value='a lot', input_type=str]

Compare this to what CompletionRequestManual would have raised for the same bad input: your own hand-written TypeError(f"max_tokens must be an int, got {type(max_tokens).__name__}"). Same intent, same failure point — Pydantic's version is generated, not written, and (as you'll see in Module 8 Week 43) can be extended with custom validation rules beyond simple type checks.

ConfigDict — configuring the model's behaviour. model_config = ConfigDict(extra='ignore') is a class-level setting, not a field. It tells Pydantic what to do if the incoming JSON has extra keys beyond the ones you declared — 'ignore' means silently drop them rather than raising an error. This matters for CompletionResponse, which is built from your ClaudeClient.complete() result dict: that dict may carry internal bookkeeping keys (timestamp, model) that CompletionResponse doesn't need to expose to the API caller. ConfigDict is how you tell Pydantic "these extra fields are fine, just leave them out" instead of failing construction.

Common mistake. Forgetting that a field without a default value (system_prompt: str) is required — Pydantic will raise ValidationError if it's missing from the input, even if you meant for it to be optional. If a field is genuinely optional, give it a default (max_tokens: int = 1024) or use Optional[str] = None, exactly as you already do with plain function parameters.

What this is not (yet). You are not writing custom validators, nested models, or field constraints in Week P4 — that is Module 8 Week 43's job, and it will feel familiar when you get there because the fundamentals are the ones you just learned here. For now, BaseModel with plain type-annotated fields is the whole pattern you need for main.py.

Progression checkpoint. Before starting Step 1, you should be able to explain in one sentence: why CompletionRequest(BaseModel) needs no __init__ method, and what happens when FastAPI receives JSON that doesn't match the declared field types.

Step-by-step exercise — Tier 2 (Scaffolded)

What Tier 2 means for this week: the Core Concept section shows you each pattern — the Dockerfile, the Docker commands, the Git workflow. The steps below tell you what to build and why, but they do not give you every command. Before attempting each step, re-read the relevant Core Concept section and identify which commands you will use. Then close the Core Concept and write the commands yourself. If you cannot recall a flag or option after a genuine attempt, refer back to the Core Concept — that is the intended workflow, not a shortcut.

Deliverable: A Dockerised version of your claude_client.py with a minimal HTTP server, verified running locally and pushed to a container registry.

Step 1 — Add a minimal web server

Install FastAPI and uvicorn: pip install fastapi uvicorn

Create main.py:

import os
import logging
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, ConfigDict
from claude_client import ClaudeClient

logging.basicConfig(level=os.environ.get("LOG_LEVEL", "INFO"))
logger = logging.getLogger(__name__)

app = FastAPI(title="CCA Prep Claude Service")
client = ClaudeClient()

class CompletionRequest(BaseModel):
    system_prompt: str
    user_message: str
    max_tokens: int = 1024

class CompletionResponse(BaseModel):
    model_config = ConfigDict(extra='ignore')
    success: bool
    response_text: str = ""
    input_tokens: int = 0
    output_tokens: int = 0
    error: str = ""

@app.get("/health")
def health():
    return {"status": "ok"}

@app.post("/complete", response_model=CompletionResponse)
def complete(request: CompletionRequest):
    result = client.complete(request.system_prompt, request.user_message, 
                              request.max_tokens)
    if not result["success"]:
        raise HTTPException(status_code=502, detail=result["error"])
    return CompletionResponse(**result)

Verify locally: uvicorn main:app --reload and open http://localhost:8000/docs.

Step 2 — Write the Dockerfile

Write a Dockerfile following the pattern from the core concept section. Create .dockerignore excluding .env, venv/, __pycache__/, logs/, and *.pyc.

Build and run:

docker build -t cca-claude-service:latest .
docker run -p 8000:8000 -e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY cca-claude-service:latest

Verify the /health endpoint returns {"status": "ok"} and the /complete endpoint returns a valid response.

Step 3 — Push to GitHub Container Registry

docker tag cca-claude-service:latest ghcr.io/YOUR_GITHUB_USERNAME/cca-claude-service:latest
docker push ghcr.io/YOUR_GITHUB_USERNAME/cca-claude-service:latest

You will need to create a GitHub personal access token with write:packages scope.

Step 4 — Create a requirements.txt

pip freeze > requirements.txt — commit this alongside the Dockerfile so any machine can reproduce your environment exactly.

Step 5 — Git workflow

Create a feature branch: git checkout -b feature/containerise-service. Commit all new files. Push the branch. Open a pull request on GitHub (even though you are the only reviewer). Merge the PR. This completes one full professional Git workflow cycle.

Reflection

Under a new ## Week P4 heading in journal.md, write 4–6 sentences answering the following. Be specific.

Your Dockerfile runs the application as a non-root user (appuser). In your organisation or a client's organisation, is the current default for containerised services to run as root or non-root? If it is root, what is the specific change to the Dockerfile that would move it to non-root — and what is the one operational reason a team might resist making that change? Your .dockerignore excludes .env. Name one other file type that should be in .dockerignore for a production AI service that handles personal data, and explain why.

Commit the journal: git add journal.md && git commit -m "week P4 reflection"

Self-check questions

Q1. Your Docker image contains a COPY .env /app/.env instruction. What is the primary production risk?

Q2. You deploy a new container image and the health check endpoint returns 200 but the /complete endpoint returns 502 errors. What is the correct first diagnostic step?

Q3. In a Git branching workflow, what is the correct sequence for adding a new feature?

Progression gate

No standalone progression-gate section in the source for this week — self-attestation still available below.

Jump to top ↑