Curriculum Introduction

Curriculum Introduction


What this curriculum is and who it is for

This is a 50-week preparation programme for the Claude Certified Architect – Foundations (CCA-F) certification, structured as a 40-week certification track (Modules 0–6) followed by two optional post-certification advanced modules (Modules 7–8) that extend it to 50 weeks. It was designed for a specific learner profile: an enterprise IT professional with 15 or more years of experience — spanning systems integration, project and programme management, business analysis, digital tools, IT infrastructure, service management, and data analytics — who has little or no coding background and wants to move credibly into AI architect and consultant roles.

That profile matters, because the curriculum is built around it. Your enterprise experience is an asset here. You already understand regulatory constraints, stakeholder management, production reliability, and the difference between a design that looks good in a diagram and one that survives contact with an engineering team. If you work in an FMCG, CPG, or multinational context, that commercial and operational background maps directly onto the curriculum's capstone scenarios — several of which are explicitly designed to be substitutable with your own industry context. What you have not yet done is build AI systems in code, deploy them to cloud infrastructure, and operate them through failure. This curriculum closes that gap.

By the time you sit the CCA-F exam, you will have done all of the following from scratch: written Python API integrations, built and deployed a containerised AI service to AWS, implemented production-grade agentic loops with safety controls, designed MCP servers and multi-agent architectures, specified and evaluated a RAG pipeline, debugged a live production failure under time pressure, and assembled five public GitHub repositories — each demonstrating a distinct enterprise AI capability in a distinct industry. The certification validates that you can design and govern production AI systems. The portfolio proves it.


The CCA-F certification

The CCA-F exam is Anthropic's practitioner-level certification for solution architects building production applications with Claude. It tests across five domains:

Domain Weight
Agentic Architecture & Orchestration 27%
Tool Design & MCP Integration 18%
Claude Code Configuration & Workflows 20%
Prompt Engineering & Structured Output 20%
Context Management & Reliability 15%

The exam is 60 questions in 120 minutes, multiple-choice format, minimum passing score 720 on a 100–1,000 scale. Questions are scenario-based: each presents a realistic production context and asks you to select the architecturally correct response. Candidates who pass know the concepts and can apply them under constraint — they can read a scenario, extract the binding constraints, rule out the wrong answers by named disqualifying factor, and commit to the correct one. This curriculum builds exactly that skill, across the 40-week certification track (with a further ten weeks of optional advanced material) of progressively harder application.

A cross-tabulation of every module's Domain Alignment table against the exam guide's 30 task statements (across all five domains) shows 23 of 30 — approximately 77% — explicitly cited as the "Primary Domain / Task Statement" for at least one weekly learning objective in Modules 1–6. The uncited eight (2.5, 3.5, 3.6, 4.5, 4.6, 5.5, 5.6, and partial depth on 4.3/4.4's batch-processing sub-scope) are not gaps in the curriculum's design intent — every one of them is addressed by design through the six official exam scenario deep-dives in Week 26, the gap-closing methodology in Week 27, and the full exam simulation and constraint-analysis practice in Week 33, all three of which are explicitly tagged "All domains / All task statements" in their respective modules' alignment tables, rather than mapped to a single task statement the way a core-week learning objective is. This curriculum does not claim a single verified aggregate coverage percentage beyond the 77% table-level figure — that figure is the honest, checkable number; the residual is a claimed-and-plausible, not independently re-verified, coverage path through Weeks 26/27/33.


Curriculum structure

The curriculum spans nine modules. Modules 0–6 form the complete CCA-F certification track. Modules 7 and 8 are post-certification advanced tracks for architects moving into enterprise platform and data engineering roles.

Module Title Weeks Time estimate
0 Prerequisite Track P1–P5 34–41 hrs
1 Foundations & Orientation 1–8 33–42 hrs
2 API Fundamentals & Agentic Thinking 9–12 20–25 hrs
3 Deployment & Operations 13–16 26–34 hrs
4 MCP, Multi-Agent, Claude Code & RAG 17–21 23–28 hrs
5 Production Engineering, LLMOps & Exam Readiness 22–28 49–70 hrs
6 Final Capstone, Exam Readiness & Certification 29–35 44–58 hrs
7 Enterprise Architecture (post-certification) 36–40 30–50 hrs
8 Data Pipeline Engineering (advanced, optional) 41–45 40–60 hrs

Total for CCA-F track (Modules 0–6): approximately 229–298 hours over 40 weeks at 6–10 hours per week.
Post-certification advanced track (Modules 7–8): approximately 70–110 additional hours over 10 weeks.

Module 0 — Prerequisite Track (Weeks P1–P5)

Before building anything with Claude, you need a reliable technical floor. Module 0 provides it: Python fundamentals, environment and credential management, containerisation with Docker, and a first cloud deployment on AWS Lambda with GitHub Actions CI/CD. By the end of Week P5 you will have a deployed AI service running in production, triggered by curl, with an API key stored in Secrets Manager and a pipeline that runs your tests on every push. Nothing in Module 0 requires prior coding experience — but everything in Module 0 is required for what follows. Week P4 now includes a full explanation of Pydantic's BaseModel pattern before it's used in that week's FastAPI service — previously introduced without explanation.

Module 1 — Foundations & Orientation (Weeks 1–8)

Module 1 builds the conceptual and technical base for the rest of the curriculum. You will learn how LLMs actually work at a token level, where Claude sits in the enterprise AI stack, how to engineer reliable prompts, and how to call the Claude API from Python code with production-quality error handling and structured output validation. The AI Fluency: Framework & Foundations pre-work for Weeks 1–4 includes a final assessment and certificate of completion — an academically co-authored, Anthropic-issued credential worth adding to your LinkedIn profile alongside the CCA-F. The module's capstone is the NorthBridge Retail Bank Operations Prompt Pack — a set of six production-grade prompt modules for a synthetic retail bank, including a runnable three-step triage pipeline that demonstrates the whole system working end-to-end. Before the capstone, Week 8 now includes a dedicated lesson on writing Architecture Decision Records (ADRs) — the format every later capstone, through Module 7, requires.

Module 2 — API Fundamentals & Agentic Thinking (Weeks 9–12)

Module 2 converts API fluency into agentic capability. You will work at the raw API layer — constructing requests by hand in Postman and programmatically in Python — tune parameters for production workloads, implement the full tool-use loop in Python, and build a production-grade agentic loop handling all four exit paths with iteration cap and context watchdog safety controls. The module introduces the Claude Agent SDK vocabulary (AgentDefinition, Task tool, allowedTools, PostToolUse hooks, Handover) that maps to your hand-rolled patterns — bridging the practical implementation you build in code with the exam's specific SDK terminology. The capstone is the MeridianHealth Patient Triage Agent — a healthcare AI agent that handles emergency escalation, audit logging, and nurse-review gating.

Module 3 — Deployment & Operations (Weeks 13–16)

Most AI architecture curricula stop at the design layer. Module 3 does not. You will containerise the MeridianHealth agent, deploy it to AWS Fargate behind a load balancer using CDK, instrument it with structured logging and CloudWatch metrics, and build a CI/CD pipeline that runs security scanning, contract tests, and prompt regression tests before every deployment. After Module 3 you will have run a production deployment, read real logs during a live failure, and built a pipeline that deploys on every commit. This operational experience changes how you design — because you have experienced what breaks.

Module 4 — MCP, Multi-Agent, Claude Code & RAG (Weeks 17–21)

Module 4 is the architecture layer. You will learn Model Context Protocol (MCP) — Anthropic's open standard for connecting AI models to external tools and data — including transport protocols (Stdio for local servers, SSE for remote services), structured error response design, and integration boundary specification. You will design multi-agent orchestration patterns with coordinator-subagent decomposition, configure Claude Code for team workflows with high-signal CLAUDE.md and Agent Skills, and produce an engineering-ready specification for a Retrieval-Augmented Generation pipeline. The capstone is the MeridiumGov Policy Research & Briefing System — a public sector AI platform where factual accuracy, source attribution, and political neutrality are architectural constraints, not aspirations.

Module 5 — Production Engineering, LLMOps & Exam Readiness (Weeks 22–28)

Module 5 is where the curriculum shifts from "builds it" to "keeps it working." You will implement retry-and-escalation strategies with exponential backoff and jitter, design prompt caching and Batch API workflows for cost optimisation, build an LLMOps evaluation pipeline that regression-tests prompts before deployment, and complete a live debugging simulation — diagnosing six pre-planted production failures under a 90-minute clock (deliberately tighter than the 120-minute exam) using only logs and metrics. Weeks 26–27 work through the six official CCA-F exam scenarios at architect level — including explicit distractor recognition training (the most common intelligent-candidate failure mode is defaulting to model upgrades rather than prompt optimisation) — and run the first practice exam with a structured gap-analysis. The capstone is the OmniaRetail AI Platform — a retail AI system demonstrating production reliability at e-commerce scale.

Module 6 — Final Capstone, Exam Readiness & Certification (Weeks 29–35)

Module 6 is the culmination. Weeks 29–32 produce the VantaOps Supply Intelligence Platform — an end-to-end multi-agent manufacturing and supply chain AI architecture documented to client-delivery or senior-hiring-panel standard. Week 33 is portfolio polish and a full exam simulation with gap-closing actions. Week 34 is certification sitting. Week 35 is a deliberate buffer for any remaining gap-closing or re-sit preparation.

Module 7 — Enterprise Architecture (Weeks 36–40, post-certification)

Module 7 covers the concerns that become critical once AI systems are running at enterprise scale: AI-specific threat modelling and security architecture, FinOps cost attribution and token budget governance, multi-region deployment and data residency, and streaming event-driven AI pipelines. These topics sit at the boundary between AI architecture and enterprise cloud architecture — the domain the CCA-F assumes you will develop through practice. Module 7 is what separates a first-deployment architect from one who can be trusted with a company's strategic AI infrastructure. For learners whose enterprise context involves AWS, Module 7 includes an optional reference to the Claude with Amazon Bedrock Skilljar course — covering Claude deployment via the Bedrock SDK for data residency, IAM integration, and AWS-native architectures.

Module 8 — Data Pipeline Engineering (Weeks 41–45, advanced, optional)

Module 8 closes the specification-to-implementation gap for data pipelines. You will build a complete RAG pipeline in running Python code, schedule it with Prefect orchestration, enforce data contracts with Pydantic and Great Expectations at every pipeline boundary, and deploy an SQS-triggered streaming pipeline to AWS Lambda. Every week produces deployed, running infrastructure. This module is for architects whose roles require building pipelines, not only specifying them.


Content structure — what every week looks like

Every week in the curriculum follows the same seven-element structure. This is intentional. Once you know the rhythm, you spend your cognitive energy on the content rather than navigating the format.

1. Learning objective — a single specific statement of what you will be able to do by the end of the week. Not "understand" — always a verb that implies observable performance: implement, design, explain to a non-technical stakeholder, produce without reference to documentation.

2. Why this matters — one or two paragraphs connecting the week's content to a specific career consequence. The week's content is the answer to a question your clients or employers are about to ask you.

3. Pre-work — anchored readings and videos from Anthropic Academy on Skilljar (anthropic.skilljar.com), that do the depth teaching before you arrive at the core concept. The core concept section is tight by design — it is architect-level synthesis after the pre-work, not first exposure to the material.

The following Skilljar courses are used as pre-work across the curriculum. All are free to access with an Anthropic account:

Course Lessons Used in
Claude 101 14 Module 1 Weeks 1–2
AI Fluency: Framework & Foundations 15 Module 1 Weeks 1–4
Building with the Claude API 85 Modules 1–2 Weeks 5–12, Module 5 review
Claude Code 101 13 Module 4 Week 20
Claude Code in Action 21 Module 4 Week 20 (hooks section)
Introduction to Agent Skills 6 Module 4 Week 20
Introduction to MCP 14 Module 4 Week 17
MCP Advanced Topics 15 Module 4 Week 18
Introduction to Subagents 4 Module 4 Week 19
Claude with Amazon Bedrock 83 Module 7 optional supplementary

Each module's pre-work table names the specific lessons within each course assigned for that week — you do not need to complete entire courses at once.

4. Core concept explained — the principle at architect level, with an enterprise analogy, a Mermaid diagram where it earns its place, and the one common mistake to avoid. This is what you need to internalise before the exercise.

5. Step-by-step exercise — the application layer. Every exercise produces a tangible artefact: committed code, a documented decision, a working service, or an architectural specification. Exercises escalate through three tiers across the curriculum:

  • Tier 1 (Guided): the complete implementation is shown in the Core Concept. You type it out line by line — no copy-paste. First encounters with new patterns are always Tier 1.
  • Tier 2 (Scaffolded): function signatures and hints are given; you write the body. The hint points to the specific Core Concept example to adapt. You write the implementation independently.
  • Tier 3 (Independent): only the specification is given. No hints, no examples. This is the level Module 1 and beyond expects — and the level you will operate at professionally.

6. Reflection prompt — a specific question requiring 4–6 sentences in a learning journal, connecting the week's technical content to your professional context or career decision. Reflection prompts are not optional; they are where the integration happens.

7. Self-check questions and progression gate — three to seven multiple-choice questions in CCA-F exam format, followed by a performance statement (the progression gate). The gate says: before moving to the next week, you should be able to do this without notes. If you cannot, revisit the relevant section — the capability map table at each module's cumulative capability check shows exactly which week covers each gate item.

Domain alignment. Every module opens with a domain alignment table that maps each learning objective to the specific CCA-F exam domain and task statement it addresses, and names the portfolio artefact that provides evidence of that competency. Use these tables to track your exam coverage week by week. For Modules 7 and 8 — which are post-certification — the table maps to career portfolio capability rather than exam domains.


Learning objectives: what the curriculum produces

By the end of Module 6 and the CCA-F exam, you will be able to demonstrate all of the following without reference to notes:

Foundational competencies

  • Explain how LLMs generate output probabilistically, why outputs vary, and what temperature does — to a CIO in two minutes.
  • Explain the top_p/temperature parameter interaction: top_p restricts the token vocabulary pool first; temperature adjusts distribution within that pool. Apply this to structured output design.
  • Select a Claude model tier for any use case with a one-sentence cost-latency-capability justification.
  • Engineer prompts with system, context, instruction, and output-format components; diagnose and fix the four common failure modes.
  • Apply few-shot prompting to resolve output inconsistency; distinguish when it can and cannot help.

API and agentic architecture

  • Call the Claude API in Python with production-quality retry logic, structured response parsing, and defensive error handling.
  • Implement the full tool-use loop: tool definition, tool_use detection, dispatch, tool_result return, and silent partial failure prevention.
  • Configure tool_choice correctly for auto, any, and forced selection scenarios with a one-sentence production rationale for each.
  • Apply vision API constraints: maximum 100 images per request, base64 encoding requirement, images in user turns only (not system prompts).
  • Build a production-grade agentic loop handling all four stop_reason exit paths with iteration cap and context watchdog controls.
  • Map the curriculum's hand-rolled patterns to the Claude Agent SDK vocabulary: Task tool, AgentDefinition, allowedTools, PostToolUse hook, fork_session, --resume.

Deployment and operations

  • Write a production-grade Dockerfile and deploy a containerised AI service to AWS Fargate using CDK.
  • Configure structured logging, CloudWatch metrics, and alerting — and diagnose a production failure from logs alone.
  • Build a CI/CD pipeline with security scanning, contract tests, and prompt regression test gates.

Integration and orchestration

  • Explain the MCP architecture and specify an MCP server — tools, resources, prompts, scoping, and error handling — to engineering-ready standard.
  • Decide when multi-agent decomposition is justified and design a hub-and-spoke orchestration architecture with explicit failure handling.
  • Configure Claude Code for team workflows with high-signal CLAUDE.md, .claude/rules/, .claude/commands/, and plan mode vs direct execution.
  • Specify a RAG pipeline — chunking strategy, vector store selection, hybrid retrieval configuration, RAGAS evaluation plan — to a standard an engineering team can implement from without follow-up questions.
  • Apply AI-specific security patterns: XML delimiters as prompt injection defence, principle of least privilege on tool schemas, PII redaction as a structural pipeline layer, and output monitoring for system prompt leakage.

Production reliability and LLMOps

  • Design a retry-and-escalation strategy with exponential backoff, jitter, and distinct handling for transient, rate-limit, and non-retryable failures.
  • Architect prompt caching and Batch API workflows with quantified cost savings at stated volume.
  • Build and interpret an LLMOps evaluation pipeline with deterministic and LLM-as-judge evaluation, stratified sampling, and regression test gates.
  • Apply the five-step production debugging methodology to a described failure scenario from logs alone.

Exam and career readiness

  • Analyse each of the six official CCA-F exam scenarios, extract the binding constraints, and identify the correct architecture — ruling out alternatives by named disqualifying factor.
  • Deliver a 90-second verbal summary of any capstone, demonstrate the running code, and explain one regulated-context design decision — for a hiring panel or consulting prospect.

The portfolio: five repositories, five industries

The curriculum's primary output is not the certification — it is the portfolio. By the end of Module 6 you will have five public GitHub repositories, each demonstrating a distinct enterprise AI capability in a distinct regulated industry:

Repository Industry Demonstrates
northbridge-prompt-pack Financial services Prompt engineering, structured output, regulated-context design
meridianhealth-triage-agent Healthcare Agentic loop, tool use, production deployment, CI/CD
meridiumgov-policy-briefing-system Public sector MCP integration, multi-agent orchestration, RAG specification
omniaretail-ai-platform Retail / e-commerce Production reliability, LLMOps evaluation, batch processing
vantaops-supply-intelligence Manufacturing / supply chain Integrative architecture, enterprise-grade ADRs, full-stack delivery

Each repository is buildable from a fresh clone with only a .env file providing the API key. Each one is what you reference in interviews, share with prospective clients, and link from your CV.


How to begin

Start here. Read this document in full, then open Module 0. Do not skip Module 0 — the Python and infrastructure foundations it provides are load-bearing for every later module.

Read before Week P1. The Curriculum Overview table and learning objectives above tell you what the full 50 weeks produce. Module 0's progression gate tells you what you need to verify before moving to Module 1. Knowing the destination before you start Week P1 prevents the most common dropout pattern: learners who discover at Week 12 that they are further behind than they thought.

Pace yourself honestly. The curriculum averages roughly 6–10 hours per week, but this varies by module: Module 3 and Module 5 run at the upper end, Module 8 runs heavier still, and Modules 1, 2, and 4 run lighter. If you are in full-time employment, 7–8 hours per week is a realistic sustainable pace. Attempting 10+ hours per week for 40 weeks while working full-time produces diminishing returns and burnout — the buffer weeks (Week 35 and the Module 7/8 optional framing) exist precisely because the curriculum acknowledges this.

Take the progression gates seriously. Each progression gate is a performance statement, not a checkbox. Moving forward with a shaky gate item is the most common reason learners stall mid-curriculum. The capability map tables in each module's cumulative capability check show exactly which week to revisit for any gate item that is not yet solid. This version adds two foundational teaching units — Pydantic in Module 0 and ADR-writing in Module 1 — for the same reason: earlier learners hit points where a concept was used before it was taught, so the curriculum now teaches these before requiring them.

Build the portfolio as you go. Each capstone produces artefacts that extend into the next module. The NorthBridge prompts feed into the MeridianHealth agent; the MeridianHealth agent feeds into the Module 3 production deployment; Module 3 deployment artefacts feed into the VantaOps final capstone. Skipping a capstone creates a gap that compounds. Build each one to the standard described — it is worth the time.

When you are ready, turn to Module 0.


Glossary

A

Active-active (deployment pattern) Module 7, Week 38

A multi-region pattern where all regions handle live traffic simultaneously; if one region fails, its traffic shifts to the others. Preferred for low-latency, user-facing AI services, but requires the application to be stateless or replicate state across regions.

Active-passive (deployment pattern) Module 7, Week 38

A multi-region pattern where one region is primary and handles all traffic while a standby region activates only on failure. Cheaper than active-active but introduces failover latency and potentially stale standby data.

Architecture Decision Record (ADR) Module 1, Week 8

A permanent, dated record of one architectural decision, written in five parts (Context, Decision, Consequences, Alternatives Considered, Review Trigger) so a reader six months later can understand not just what was decided but why, and under what conditions the decision should be revisited.

Adversarial input manipulation Module 7, Week 36

Inputs deliberately crafted to cause systematic errors in an AI system's classification, routing, or decision-making, most dangerous in high-stakes decision contexts like credit or fraud.

Agent / Agentic loop Module 2, Week 12

Defined fundamentally as a loop: send messages to Claude, check stop_reason, execute tools and loop if needed, or exit on end_turn. The curriculum stresses "an agent is a loop with structured decision-making," not an autonomous intelligence in itself.

AgentConfig Module 2, Week 12

A dataclass holding an agent's operational safety settings (model, max tokens per turn, max iterations, context token limit) that parameterize the Agent class's loop behavior.

AgentDefinition Module 2, Week 12

The Claude Agent SDK construct that configures a subagent's identity, system prompt, and permitted tools; the SDK-level equivalent of the curriculum's hand-rolled Agent class.

Agent Instruction Architecture (Three-Layer Structure) Module 6, Week 30

A structure for agent instructions with three layers: role definition (what the agent is/isn't responsible for), tool use protocol (when to call which tool and how to handle failure), and output contract (the exact typed structure the agent must produce for the orchestrator).

AgentResult Module 2, Week 12

A dataclass returned by Agent.run() capturing the final text, iteration count, token totals, exit reason, and any errors produced during a loop run.

Agent Skills Module 4, Week 20

Packaged capabilities consisting of a SKILL.md file plus optional supporting assets that Claude can invoke when its description field matches the current task; a form of context engineering distinct from CLAUDE.md, hooks, and subagents.

AI-specific threat landscape Module 7, Week 36

The set of threat categories unique to AI systems (beyond standard cloud threats), including prompt injection, model extraction, data exfiltration through generation, adversarial input manipulation, and supply chain attacks on AI dependencies.

alpha (hybrid retrieval blend ratio) Module 8, Week 41

A tunable parameter weighting dense vector retrieval against sparse (BM25) retrieval in the HybridRetriever; e.g. alpha=0.7 means 70% dense/30% sparse, calibrated against the corpus and query distribution.

allowed-tools Module 4, Week 20

A SKILL.md frontmatter field that restricts which tools a skill is permitted to use, preventing unintended actions (e.g. a review skill writing files).

Answer Relevancy Module 5, Week 24

A RAGAS metric asking whether the generated answer is actually relevant to the question that was asked.

API Gateway Module 3, Week 14

A component that sits in front of a Fargate service to handle TLS termination, authentication, rate limiting, and request logging, so the backend service is never exposed directly to the internet.

Application Load Balancer (ALB) Module 3, Week 14

A managed AWS component that distributes incoming traffic across multiple Fargate tasks and performs health checks against them to route traffic only to healthy instances.

argument-hint Module 4, Week 20

A SKILL.md frontmatter field that prompts the developer for required parameters when a skill is invoked without arguments.

AWS CDK (Cloud Development Kit) Module 3, Week 14

Infrastructure-as-code tooling with two parts: a Node-based CLI (aws-cdk) that runs deployments via CloudFormation, and a Python library (aws-cdk-lib) used to write stack definitions as code rather than clicking through the console.

AWS Fargate Module 3, Week 14

A managed container service that runs containers without requiring the user to provision or manage the underlying servers; the cloud provider handles EC2 instances, OS patching, and cluster scheduling.

AWS X-Ray Module 5, Week 22

A distributed tracing tool integrating with Fargate and Lambda that produces visual service maps and latency traces, showing which segment of a multi-service request timed out or errored.

B

Batch API Module 5, Week 23

Anthropic's asynchronous processing endpoint that accepts a file of up to 10,000 requests, processes them within up to 24 hours, and charges 50% of the synchronous rate; correct for workloads where results aren't needed immediately.

BM25 Module 4, Week 21

Defined inline as "a classic keyword-ranking algorithm that scores documents by term frequency, adjusted for document length"; the sparse-retrieval component of hybrid search.

C

Case Facts Block Module 5, Week 23

A persistent, structured block of key transactional facts (customer ID, order number, amount, etc.) maintained outside the summarized conversation history so critical details don't degrade when older turns are compressed.

Chain-of-thought (CoT) Module 1, Week 4

The technique of instructing the model to reason step-by-step before producing its final answer, which materially improves accuracy on multi-step tasks; primarily reduces hallucination on reasoning tasks and reduces drift by anchoring the model to explicit intermediate steps.

Chunking gap Module 8, Week 41

The implementation reality that "semantic chunking" requires defining concrete boundary rules in code (headings, double newlines, section numbers) and handling edge cases like very short or very long sections.

.claude/commands/ Module 4, Week 20

A directory for project-scoped custom slash commands that are version-controlled and shared with all team members (contrast with personal commands in ~/.claude/commands/).

Claude Code Module 4, Week 20

The CLI agent that reads CLAUDE.md at session start and can invoke Agent Skills when their descriptions match the work at hand; shifts development from chat-based assistance to agentic codebase work bounded by the user's conventions.

CLAUDE.md Module 4, Week 20

A markdown file at a project's root that Claude reads to establish working context — project description, conventions, key files, commands, non-obvious decisions, and things to do or avoid; always in context, unlike conditionally-loaded Skills.

.claude/rules/ Module 4, Week 20

Files with YAML frontmatter paths fields that load conventions conditionally based on file-path pattern matching, regardless of which directory they live in — contrasted with directory-bound CLAUDE.md files.

Claude with Amazon Bedrock Module 7, Week 36

An optional Anthropic Academy course covering the full Bedrock implementation path (API access via Boto3, multi-turn conversations, tool use, RAG, prompt caching, MCP, agents) using the Amazon Bedrock SDK instead of the direct Anthropic API.

CloudWatch Logs Insights Module 3, Week 15

A query interface for structured logs stored in CloudWatch, allowing searches such as filtering all requests where a field matches a specific value, used to diagnose failures from log data alone.

/compact Module 4, Week 20

A Claude Code command that reduces context usage during extended exploration sessions once context fills with verbose discovery output; a cue to use it is Claude starting to reference "typical patterns" rather than specific explored code.

CompletionResult Module 1, Week 5

A @dataclass-based typed response object (text, input_tokens, output_tokens, stop_reason, model) that wraps a raw Claude API response for structured, predictable downstream consumption.

Conceptual Gap Module 5, Week 27

One of the practice-exam error classifications: a wrong answer caused by not understanding the underlying principle being tested.

Constraint-Analysis Method Module 6, Week 33

The exam reasoning technique: extract the binding constraints from a scenario, rule out alternatives by a specifically named disqualifying factor, then commit to the option that satisfies all binding constraints.

Constraint Misreads Module 6, Week 33

An error type where the test-taker understood the concept and pattern but misread a binding constraint in the scenario description; diagnosed by getting the question right on re-reading but wrong under timed conditions.

Contract layers (ingestion / embedding / retrieval) Module 8, Week 43

The three natural boundaries in a RAG pipeline where Pydantic data contracts are enforced: ingestion (what a source document must contain to enter the pipeline), embedding (what a chunked/embedded record must contain before indexing), and retrieval (what a retrieved chunk must contain before reaching Claude's prompt).

Contract test Module 3, Week 16

A test layer that validates a tool's actual return format matches the schema the agent's code expects, catching "tool schema drift" immediately on the next commit rather than in production.

Context engineering Module 4, Week 20

"The discipline of curating what Claude knows for a given task," of which both CLAUDE.md and Agent Skills are forms.

context: fork Module 4, Week 20

A SKILL.md frontmatter option that runs the skill in an isolated sub-agent context so its output does not pollute the main conversation; used for skills producing verbose exploratory output.

Context fragmentation Module 4, Week 19

One of four named multi-agent failure modes: subagents working without the information they need.

Context overflow Module 2, Week 12

One of the four principal agentic failure modes, occurring when conversation history exceeds the model's context window.

Context Precision Module 4/5, Week 21/24

A RAGAS metric asking whether the retrieved chunks are actually relevant to the question (or focused vs. containing irrelevant information); catches poor retrieval.

Context Recall Module 4/5, Week 21/24

A RAGAS metric asking whether the retrieved context contains all the information needed to answer the question; catches incomplete retrieval.

Context token limit / context watchdog Module 2, Week 12

A safety control that checks accumulated input token usage at the top of each loop iteration and exits gracefully (exit_reason = "context_limit") before the context window is exceeded.

Context window Module 1, Week 2

The maximum amount of text a model can hold in working memory for one call, including both input and output, measured in tokens.

ContextVar Module 5, Week 22

A Python construct that gives each concurrent async execution its own isolated variable value, preventing trace IDs or other per-request state from being overwritten across interleaved concurrent requests.

Container security requirements (five) Module 3, Week 13

A defined set of non-negotiable hardening rules for containers handling sensitive data: non-root user, read-only root filesystem, no secrets baked into image layers, minimal base image, and mandatory image vulnerability scanning.

ConversationManager Module 1, Week 7

A class that maintains multi-turn conversation state (message history, running token totals) and re-sends the full history to the stateless Claude API on every call, since the API itself retains no memory between requests.

Coordinator confusion Module 4, Week 19

One of four named multi-agent failure modes: the orchestrator loses track of subagent state.

Cost attribution architecture Module 7, Week 37

A middleware pattern where every AI API call carries metadata (product, team, model, token counts, cost) published to a telemetry backend so cost can be traced back to the business unit that incurred it.

Cost-Proportional Design Module 6, Week 31

Applying the most cost-efficient API mechanism (e.g. prompt caching vs. Batch API) to each workload based on its specific latency and volume characteristics, rather than using one uniform approach everywhere.

Cost runaway Module 4, Week 19

One of four named multi-agent failure modes: parallel execution multiplies token consumption while quality plateaus or degrades.

Crash Recovery Manifest Module 5, Week 25

A structured state file that an agent periodically exports during long-running multi-agent tasks so a coordinator can reload it on resume and inject prior context into agent prompts instead of restarting from scratch.

custom_id Module 5, Week 23

The required key in each Batch API request entry that links a submitted request to its corresponding result at retrieval time; a typo here causes silent data loss.

D

@dataclass decorator Module 1, Week 5

A Python decorator that auto-generates __init__, __repr__, and other boilerplate for a class from its field declarations, so fields can be declared once as typed attributes instead of hand-written in a constructor.

Data exfiltration through generation Module 7, Week 36

An attack where a deployed AI system with access to sensitive data is prompted to reproduce that data, particularly acute in RAG systems where crafted prompts can cause verbatim reproduction of knowledge-base chunks.

Data residency Module 7, Week 38

The requirement that data be stored in a specified geographic region, distinct from (but related to) data sovereignty.

Data sovereignty Module 7, Week 38

The principle that data is subject to the laws of the jurisdiction where it is stored or processed; a system can satisfy residency while still violating sovereignty (e.g. transmitting EU data to a US-based API).

Dead-letter queue (DLQ) Module 7, Week 39

A queue that receives messages which fail processing after a set number of retries, routing them for human review instead of silently dropping them.

Decomposition (multi-agent) Module 4, Week 19

The decision to split a task across multiple agents; justified only when subtasks have genuinely distinct expertise/context requirements, when parallel execution offers real time savings, or when isolation provides security/audit benefit — not because it "sounds sophisticated."

Decorator (reading, not writing) Module 0, Week P2

The @ syntax that passes a function through a wrapping function and replaces it with the result, adding behaviour (like route registration or automatic retries) without changing the wrapped function's body; the curriculum teaches learners to read decorators like @app.get and @task without needing to write custom ones.

detected_pattern field Module 5, Week 24

A schema field in classification/code-review pipelines that records the specific code construct or text pattern that triggered a finding, enabling systematic false-positive analysis.

Distributed Tracing Module 5, Week 22

A diagnostic technique showing where in a multi-service request lifecycle time was consumed, visualizing the full call tree as a waterfall of labeled spans — distinct from logs (what failed) and metrics (how often/fast).

.dockerignore file Module 3, Week 13

Described as "a security document": a file listing paths (like .env, venv/, .git/) excluded from the Docker build context so secrets and unnecessary files never enter the image.

Docker Compose Module 3, Week 13

A tool for running multiple containers as a coordinated local development environment, with each service in its own container connected via a shared network.

Drift Module 1, Week 4

One of the four common prompt failure modes: the model wanders off the original task; mitigated by anchoring the task in the system prompt, adding explicit stop conditions, and restating the task at the end of long prompts.

E

ECS Service Module 3, Week 14

An AWS construct that maintains a defined number of running Fargate tasks and automatically replaces any that become unhealthy.

ECS Task Definition Module 3, Week 14

The AWS construct describing a container's configuration (image, CPU/memory, environment, networking) for deployment on Fargate.

Embedding contract / EmbeddedChunk Module 8, Week 43

A Pydantic model validating that an embedded chunk has correctly sized text, a normalized embedding vector of the expected dimension, and required metadata before indexing.

Embedding model Module 4, Week 21

The component that converts text into a vector (a numerical representation of semantic content); tradeoffs include vector dimension, cost per token, and retrieval quality.

EmbeddingModel Module 8, Week 41

A wrapper class around sentence-transformers used to embed documents and queries into dense vectors (e.g. using all-MiniLM-L6-v2, 384 dimensions).

Environment variable pattern Module 0, Week P3

Storing credentials (API keys, passwords, URLs) outside source code — in a local .env file during development, loaded at runtime via python-dotenv into os.environ — so secrets are never committed to version control; in production, the .env file is replaced by a cloud secrets manager but the application's os.environ access pattern stays the same.

Enum with "other" + Detail String Module 5, Week 24

A schema design pattern for categorical fields where the value set isn't exhaustive: add an "other" enum option paired with a free-text detail field, so novel categories don't require schema updates.

Error rate (metric) Module 3, Week 15

The fraction of requests that fail, tracked by error type (API error, tool failure, context overflow, timeout) to identify which component is degrading.

Error Type Classification Module 6, Week 33

The practice of sorting every wrong simulation answer into one of three categories (conceptual gap, pattern application gap, constraint misread) before reading explanations, so the true cause of the mistake gets diagnosed.

errorCategory / isRetryable Module 4, Week 18

Structured fields in an MCP tool's error response letting the calling agent decide whether to retry, escalate, or abandon. Four categories: transient (retryable), validation, permission, and business (all non-retryable).

Evaluation gap Module 8, Week 41

The insight that a RAGAS metric score alone is just a number; understanding it requires reading the failing cases to determine whether errors originate in retrieval or generation.

Event-driven sync Module 7, Week 38

A knowledge-base synchronization approach where document ingest events are published to a message bus and each region's ingestion service subscribes and processes independently, producing eventual consistency with configurable lag.

execute_tool (contract-bound tool interface) Module 3, Week 16

A structured tool-calling function returning JSON that matches a locked schema, distinct from a prose-returning demo executor, used so contract tests can validate the tool's output shape.

Explore subagent Module 4, Week 20

A subagent pattern that isolates verbose discovery output (e.g. scanning a large codebase) from the main conversation context, surfacing only the summary.

Exponential Backoff Module 5, Week 22

A retry pattern where the wait interval between retries doubles on each successive attempt (e.g. 1s → 2s → 4s → 8s), used when multiple clients share a constrained resource.

F

Failure Modes (tool specification field) Module 6, Week 30

One of seven required fields in a tool specification: defines what the tool returns or raises when it cannot complete its function; the most commonly omitted field, whose absence turns a tool catalogue into a "wish-list" rather than a real specification.

Failure Taxonomy Module 5, Week 22

The curriculum's framework classifying API failures into three distinct classes — transient infrastructure errors, rate limit responses, and timeout conditions — each requiring different handling.

Faithfulness Module 4/5, Week 21/24

A RAGAS metric measuring whether every claim in a generated answer is grounded in the retrieved context — the absence of hallucination.

Few-shot prompting Module 1, Week 6

Including 2–4 worked examples directly in a prompt, each showing an input and the exact output expected, so the model generalises to novel inputs; the most reliable way to enforce output consistency, though it cannot manufacture data genuinely absent from the source.

Field-Level Confidence Scores Module 5, Week 24

Having a model output a confidence value alongside each extracted field so review thresholds can be calibrated against a labeled validation set and low-confidence extractions routed to human review.

Fixed-size chunking Module 4, Week 21

Splitting documents by token or character count (e.g. 512 tokens with 50-token overlap); simple and fast but may split a sentence or clause mid-thought.

Five Quality Markers (of an ADR) Module 6, Week 31

A checklist for a citable ADR: (1) context names specific constraints, (2) decision commits to one specific choice, (3) consequences name both benefits and costs, (4) alternatives considered are ruled out by a named disqualifying factor, (5) review trigger is a threshold/condition, not a calendar date.

Format violation Module 1, Week 4

One of the four common prompt failure modes: the model does not adhere to the requested output structure; mitigated with an explicit schema, an example output, and an instruction to return only the requested format.

G

GitHub environment protection Module 3, Week 16

A GitHub feature that adds a required manual approval/reviewer gate before a deployment job can run against a named environment (e.g. "production"), enforcing a change-management step.

Great Expectations (GX) Module 8, Week 43

A data validation library used to check corpus-level quality (e.g. uniqueness of IDs, expected value sets, non-null constraints), distinct from Pydantic's per-record validation.

H

Hallucination Module 1, Week 4

One of the four common prompt failure modes: the model confidently fabricates facts; mitigated by instructing the model to admit uncertainty, grounding in provided sources, lowering temperature, and using chain-of-thought for reasoning tasks.

Handover / HandoverResult Module 2, Week 12

The structured delegation/response pattern used when a subagent finishes work and reports back to its coordinator; a subagent issues a Handover with its outcome and status, and the coordinator receives a HandoverResult to decide next steps.

HNSW index Module 8, Week 41

"Hierarchical Navigable Small World" — the pgvector index type used for fast approximate nearest-neighbor vector search, configured via parameters like m and ef_construction.

Hook (Runtime Hook) Module 6, Week 31

A piece of configuration — not a prompt instruction — that Claude Code or an agent runtime executes automatically at a defined point in the tool-use lifecycle; mechanically enforced by the runtime, so deterministic rather than probabilistic.

Hook-Based Enforcement Module 6, Week 31

The use of hooks (e.g. PostToolUse) to intercept every tool call and deterministically enforce rules regardless of model behavior, providing guaranteed (not merely probabilistic) compliance for must-never-violate rules.

Hooks vs. prompt instructions Module 2, Week 12

The architectural distinction that hooks enforce rules deterministically (guaranteed), while system-prompt instructions only produce probabilistic compliance the model might not honor in edge cases.

Hub-and-spoke (multi-agent pattern) Module 4, Week 19

The most common multi-agent architecture: a coordinator agent plans, dispatches subtasks to specialist subagents, and integrates their outputs.

Hybrid retrieval Module 4, Week 21

Combining dense (vector similarity) and sparse (keyword/BM25) retrieval; dense finds semantically similar text despite differing phrasing, sparse finds exact keyword/citation matches dense search sometimes misses.

HybridRetriever Module 8, Week 41

A retrieval class that blends dense (cosine similarity via pgvector) and sparse (BM25) retrieval scores, normalizing and combining them via the alpha weighting to return top-k chunks.

I

IAM role (for OIDC) Module 3, Week 16

An AWS identity that trusts the GitHub OIDC provider and is granted only the specific permissions needed for a deployment job, replacing long-lived stored credentials.

Image scanning Module 3, Week 13

Automated scanning of a pushed container image for known CVEs before deployment, supported natively by GitHub Container Registry and AWS ECR.

Indexing phase (RAG) Module 4, Week 21

The offline phase of a RAG pipeline in which documents are chunked, each chunk is embedded into a vector, and embeddings are stored in a vector database alongside the original text.

Infinite loops Module 2, Week 12

One of the four principal agentic failure modes: the agent keeps requesting tools without making progress.

Ingest lag tolerance Module 7, Week 38

A design choice accepting that secondary regions may lag a primary region's data by a bounded amount of time, with that lag surfaced explicitly in response metadata.

Ingestion contract / SourceDocument Module 8, Week 43

A Pydantic model acting as a hard gate at pipeline entry: documents failing validation (missing fields, empty/boilerplate text, invalid doc_type) are quarantined rather than processed.

input_schema Module 2/4, Week 11/18

The JSON Schema portion of a tool definition specifying the parameters (types, requirements) a tool accepts; can also be used to enforce business constraints at the schema level (e.g. a required minimum-sources parameter).

Input validation (fail fast) Module 0, Week P2

Checking a function's inputs at the boundary — type, null, value, and empty checks using isinstance() — and raising a specific, descriptive error immediately, rather than letting bad data travel silently into business logic and fail confusingly later.

Integration / regression tests (Layer 3) Module 3, Week 16

A small, carefully chosen set of tests that run against the live Claude API only on merge to main, verifying behaviors that cannot be validated with mocks, kept under 20 tests and using the cheapest model.

Integrative Architecture Design Module 6, Week 29

The property distinguishing a final capstone from a module-level capstone: it must show that all prior capability clusters work together, visualized through data flows and control flows connecting components, not just the components themselves.

Iterative refinement loop Module 4, Week 19

A coordinator pattern in which the coordinator evaluates synthesis output for coverage gaps and re-delegates targeted queries to subagents to fill them, repeating until coverage is sufficient or an iteration limit is reached.

J

Jitter Module 5, Week 22

Random variation added to a retry wait interval so a fleet of clients retrying after the same failure event doesn't all retry at the exact same moment (which would produce a thundering herd).

jq Module 5, Week 25

A CLI JSON query tool used to read structured logs during the live debugging simulation.

JSON output (structured output technique) Module 1, Week 6

Defining a response schema explicitly in the prompt — keys, value types, allowed values — and instructing the model to return only valid JSON, anchored with a one-shot example; requires defensive parsing downstream since models occasionally prepend prose or wrap output in markdown fences.

L

Latency metrics Module 3, Week 15

Request-duration measurements tracked at p50, p95, and p99 percentiles (not just the median), because the median hides tail behavior that degrades user experience.

Least-privilege IAM policy Module 3, Week 16

An access policy scoped to only the specific actions a role needs (e.g. ECR push, ECS deploy).

List comprehension Module 0, Week P2

A concise one-line Python syntax for transforming and filtering a list, following the pattern [expression for item in iterable if condition]; preferred over a for-loop when building a new list in one step without side effects.

LLM-as-Judge Evaluation Module 5, Week 24

Using a second LLM to evaluate the output of a first LLM against structured criteria, used for qualities difficult to check deterministically (tone, helpfulness, factual accuracy, relevance).

LLMOps Module 5, Week 24

The discipline of systematic, ongoing measurement of AI system behavior in production so that drift (from model updates, input shifts, or changing consumer expectations) is detected programmatically rather than via user complaints.

Lost in the Middle Effect Module 5, Week 23

The phenomenon where models reliably process information at the beginning and end of long inputs but may miss or under-process content buried in the middle; mitigated by front-loading priority content, using section headers, and repeating key summaries at both ends.

M

Managed container service Module 3, Week 14

A deployment model where the developer specifies the container image, resource allocation, environment, and scaling policy, while the cloud provider manages the underlying host infrastructure.

max_tokens Module 2, Week 9

A request parameter that caps response length, bounding both cost and latency; as a stop_reason value it indicates the response was truncated mid-generation.

.mcp.json Module 4, Week 17

The project-root configuration file for project-scoped MCP servers, shared with all team members via version control.

MCP client Module 4, Week 17

The consumer application (e.g. Claude Desktop) that discovers what each configured MCP server exposes and routes requests/responses through the protocol.

MCP prompts Module 4, Week 17

One of the three MCP primitives: parameterised, server-defined templates encapsulating common invocations and capturing reusable domain knowledge.

MCP resources Module 4, Week 17

One of the three MCP primitives: read-only data exposed via URI patterns, access-controlled and cacheable by protocol design.

MCP server Module 4, Week 17

An external service that exposes tools, resources, and prompts to MCP-aware clients; can be developed, deployed, and updated independently of the client.

MCP tools Module 4, Week 17

One of the three MCP primitives: actions that cause side effects, requiring careful permission design, idempotency consideration, and audit logging.

/memory Module 4, Week 20

A Claude Code session command used to verify which memory files (CLAUDE.md layers) are currently loaded, to diagnose inconsistent behavior across sessions.

messages array Module 2, Week 9

The structured conversation history in a request, alternating user and assistant turns in chronological order.

Model Context Protocol (MCP) Module 4, Week 17

"An open protocol that lets AI clients discover and use capabilities exposed by external servers," cleanly separating the AI client from the integration layer; analogized to a service mesh for AI integrations.

Model deprecation management plan Module 5, Week 24

A four-part operational plan (model registry, deprecation monitoring subscription, upgrade testing process, rollback procedure) for handling Anthropic's advance-notice model deprecations without silent production breakage.

Model extraction Module 7, Week 36

An attack in which an adversary queries a system exhaustively to reverse-engineer its system prompt or fine-tuning, defended against via rate limiting and anomaly detection.

Model tier selection (as cost driver) Module 7, Week 37

The choice of which Claude model tier (Haiku/Sonnet/Opus) to use for a task; using a higher tier than a task requires produces a large cost premium with no quality benefit.

Multi-component AI service Module 3, Week 13

A service architecture composed of several coordinated pieces (e.g. agent runtime, logging sidecar, mock EHR for local testing) rather than a single script, run together via Docker Compose.

Multi-Pass Review Architecture Module 5, Week 24

A code/output review pattern using independent review instances (per-file local pass, cross-file integration pass, independent review instance without the generator's reasoning context) because a model reviewing its own output is less effective than a fresh instance.

Multi-stage Docker build Module 3, Week 13

A Dockerfile pattern using a separate FROM stage for building dependencies and another for the runtime image, so build tools and intermediate artifacts are excluded from the final, smaller, more secure image.

N

Non-root user (container) Module 3, Week 13

Running the container process as a named, unprivileged user rather than root, so a compromised container process cannot write to sensitive host paths on misconfigured systems.

Nullable Fields Module 5, Week 24

Marking schema fields as optional/nullable when source data may not contain the information, preventing the model from fabricating a value to satisfy a required-field constraint.

O

OIDC (OpenID Connect) authentication Module 3, Week 16

A CI/CD authentication pattern where GitHub Actions obtains short-lived, job-scoped AWS credentials via a trusted identity provider, eliminating the need to store long-lived AWS access keys as GitHub secrets.

on_failure hook Module 8, Week 42

A Prefect flow-level callback invoked when a flow run fails, used to trigger alerting (PagerDuty, Slack, email).

OpenTelemetry Module 3, Week 15

"The standard tracing framework": a system where a trace_id generated at request entry is propagated through every downstream call, enabling reconstruction of a complete call tree for a single request.

Orchestrator Decomposition Logic Module 6, Week 30

The orchestrator's core responsibility: determining, from an incoming trigger event, which subagents to dispatch, in what order, and with what inputs — made explicit in the orchestrator's instructions rather than left to general model reasoning.

--output-format json / --json-schema Module 4, Week 20

Claude Code CLI flags producing machine-parseable structured output, with --json-schema enforcing a specific output schema.

P

Pattern Application Gaps Module 6, Week 33

An error type where the test-taker understands the underlying concept but fails to apply it correctly to the specific scenario; diagnosed by understanding what was being tested but selecting the wrong option.

@patch decorator Module 3, Week 16

A testing utility that temporarily replaces a specified object with a fake for the duration of a test, then restores the original afterward, so code can be tested without calling the real Anthropic API.

pgvector Module 8, Week 41

A PostgreSQL extension providing vector storage and similarity search, used as the vector store for the RAG pipeline's document chunks.

Plan mode Module 4, Week 20

One of Claude Code's two operating modes, used for large-scale/multi-file changes, situations with multiple valid implementation approaches, architectural decisions, or when dependencies must be understood before committing to changes.

PostToolUse hook Module 2, Week 12

An Agent SDK hook that intercepts tool results after execution but before the model processes them, used for tasks like data normalization and policy enforcement; a specific instance runs immediately after any tool call completes and before the result is returned to the model, and can be used to deterministically block sensitive data from reaching the model's context.

Prefect Module 8, Week 42

A Python orchestration framework providing state management, task-level retries, automatic dependency-based parallelization, and scheduled deployment for data pipelines.

Primary write, replicate pattern Module 7, Week 38

A knowledge-base synchronization approach where all document ingest goes to a primary region with near-real-time database replication to secondary regions, while reads are served locally.

Prompt-Based Enforcement Module 6, Week 31

Instructions placed in the system prompt that the agent is expected to follow, offering only probabilistic (not guaranteed) compliance — contrasted with hook-based enforcement.

Prompt Caching Module 5, Week 23

Claude's mechanism for caching the stable prefix of a request (system prompt plus static context) server-side so repeated identical prefixes are billed at roughly 10% of normal input token cost; any change to the cached content invalidates the cache.

Prompt drift Module 4, Week 19

One of four named multi-agent failure modes: system prompts drift between agents, producing inconsistent output.

Prompt injection Module 1/7, Week 3/36

An attack where user-supplied input contains instructions that attempt to override or subvert the system prompt; defended against via input sanitization, output filtering, and treating user input as untrusted data throughout the pipeline.

Prompt regression test Module 3/5, Week 16/24

A test that runs a changed prompt against a fixed set of inputs and checks the output against structured properties (not exact strings), verifying that prompt or model changes haven't broken previously working behavior.

Prompt Versioning Module 5, Week 24

Treating prompts as code under version control with semantic versioning (major bump for structural/breaking output changes, minor bump for quality improvements that preserve structure).

@property (Python decorator) Module 0, Week P2

Turns a method into a computed, read-only attribute accessed without parentheses; used throughout the curriculum in Pydantic models and client class extensions to expose derived values (e.g. message count, session age).

-p / --print flag Module 4, Week 20

A Claude Code CLI flag enabling non-interactive mode for automated/CI pipelines; without it, Claude Code waits for interactive input and hangs the CI job.

Pydantic BaseModel Module 0, Week P4

A base class that turns a type-annotated class definition into a validated data contract: Pydantic auto-generates the constructor, type checks, and error messages from the field annotations, rejecting invalid input immediately (raising ValidationError) rather than letting bad data travel silently into business logic.

ConfigDict Module 0, Week P4

A Pydantic class-level setting (e.g. model_config = ConfigDict(extra='ignore')) that configures a model's behaviour toward unexpected input, such as silently dropping extra JSON keys instead of raising an error.

Q

Quality gate Module 8, Week 42

A Prefect task that runs RAGAS evaluation and deliberately raises an exception — failing the pipeline — if faithfulness or context precision scores fall below defined thresholds.

R

RAGAS (evaluation metrics) Module 4/5, Week 21/24

A family of automated evaluation metrics for RAG systems, computed without requiring human-labelled test sets; the four named metrics are faithfulness, answer relevance, context precision, and context recall.

Read-only root filesystem Module 3, Week 13

A container hardening setting where the filesystem is mounted read-only except for explicitly defined writable volumes, preventing file-based attacks from persisting across restarts.

Readiness Threshold Module 6, Week 33

A personally set minimum exam-simulation score required before booking the real exam; the curriculum's suggested common benchmark is 80% overall with no individual domain below 70%.

Reciprocal Rank Fusion (RRF) Module 4, Week 21

"A method for combining two ranked lists, dense and sparse, by summing each document's 1/(rank + k) across both lists, so documents ranking well in either list surface near the top."

Refusal Module 1, Week 4

One of the four common prompt failure modes: the model declines a reasonable request, often due to over-cautious system prompts; mitigated by replacing blanket caution with specific scope.

Retrieval-Augmented Generation (RAG) Module 4, Week 21

"A two-phase architecture" comprising an offline indexing phase (chunk, embed, store) and an online retrieval phase (embed query, retrieve top-k similar chunks, inject as grounding context) so Claude's generated answer cites and stays faithful to retrieved material.

Retrieval contract / RetrievedContext Module 8, Week 43

A Pydantic model acting as the last line of defense before a retrieved chunk reaches Claude's prompt, requiring a valid section reference, minimum text length, and bounded hybrid score.

Retrieval phase (RAG) Module 4, Week 21

The online, query-time phase of RAG in which the user's query is embedded, the vector store returns the k most semantically similar chunks, and those chunks are injected into the prompt as grounding context.

Retrieval quality gap Module 8, Week 41

The implementation discovery that dense vector similarity alone fails on exact term/pattern lookups, requiring a hybrid retrieval approach whose blend ratio must be empirically tuned.

Retry envelope Module 5, Week 22

The bounded time/attempt budget (defined by max_elapsed_s and max_attempts) within which retries are allowed before a request is either surfaced as a hard failure or escalated to a dead-letter store.

RetryConfig Module 5, Week 22

The configuration class defining a retry envelope's parameters (max attempts, initial backoff, backoff multiplier, jitter range, max elapsed time).

RetryingClaudeClient Module 5, Week 22

A wrapper class around the base ClaudeClient that adds configurable retry logic while preserving the same public interface.

Review Trigger (ADR field) Module 1/6, Week 8/31

The ADR section naming the specific threshold or condition — not a date — under which a decision should be revisited (e.g. a volume threshold or an API capability change).

S

sanitise_user_input() Module 7, Week 36

A function that checks user input against high-risk prompt-injection regex patterns and flags (rather than silently blocks) suspicious or over-length input for human/audit review.

Scenario-Reading Error Module 5, Week 27

One of the practice-exam error classifications: understanding the principle correctly but misreading a specific constraint stated in the question.

Scratchpad Files Module 5, Week 23

A scratchpad.md file an agent maintains during long codebase-exploration sessions to record key findings, so a summary can be reinjected after context compaction to prevent the agent from drifting into generic answers.

Secrets Manager (AWS) Module 3, Week 14

A managed secrets store referenced by a Fargate task definition so credentials like the Anthropic API key are injected into the container at runtime rather than stored in the image or environment file.

Semantic chunking Module 4, Week 21

Splitting documents at natural boundaries (headings, paragraphs, clause boundaries) to preserve coherence, requiring more logic than fixed-size chunking; almost always preferable for legislation.

SemanticChunker Module 8, Week 41

A chunking class that splits documents at detected section boundaries, then enforces minimum chunk size (merging undersized chunks forward) and maximum chunk size (splitting oversized chunks at sentence boundaries).

sentence-transformers / all-MiniLM-L6-v2 Module 8, Week 41

A local embedding library and specific pretrained model (384 dimensions, ~80MB, CPU-friendly) used as the standard starting point for generating dense embeddings.

Silent partial failure Module 2, Week 11

A tool failing or returning bad/contradictory data without the agent noticing, so it continues reasoning on unreliable information; one of the most consequential production agent failure modes.

SKILL.md Module 4, Week 20

The file defining an Agent Skill via YAML frontmatter (name, description, context, allowed-tools, argument-hint) plus markdown instructions; the description field determines whether Claude decides the skill is relevant to the current task.

Sliding window truncation Module 5, Week 23

A context management strategy that keeps the system prompt, the most recent N conversation turns, and a cheap-model-generated summary of older turns, discarding the rest.

Smoke test Module 3, Week 16

A test file that imports every module, constructs every class against a mock, and calls every public function with minimal valid inputs, asserting only that nothing raises and return shapes match expectations.

SQS (Simple Queue Service) Module 7, Week 39

The standard queueing layer between event sources and AI processing functions, providing durability, decoupling, backpressure, and dead-letter queue support.

SQS + Lambda + DynamoDB pattern Module 8, Week 44

The three-component streaming architecture for AI event pipelines: an SQS queue with DLQ, a Lambda handler that calls the Claude API and writes results, and a DynamoDB findings store.

Stratified random sampling Module 5, Week 24

A QA practice of sampling review cases from the high-confidence stratum (not only low-confidence ones) of an extraction pipeline's output, because aggregate accuracy can mask poor performance on specific document types.

Streaming Module 2, Week 10

Returning the model's response token-by-token as it's generated (via server-sent events) rather than as a single complete payload, dramatically improving perceived latency for chat UIs at the cost of added client complexity.

Streaming vs batch decision Module 7, Week 39

The architectural framing that streaming is warranted only when required latency is shorter than a batch interval allows, or event volume is too large to queue for batch without breaching SLA.

STRIDE Module 7, Week 36

The standard enterprise threat-modeling framework (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege), applied with AI-specific manifestations for each category.

Structured Conversation Management Module 5, Week 23

Maintaining a compact, deterministic structured state object (key decisions, open items, current task context) instead of a raw growing conversation history.

Structured logging Module 3, Week 15

Logging as JSON objects with defined, named, typed fields rather than free-text strings, making every field queryable at scale in a tool like CloudWatch Logs Insights.

Structured Null Result Module 6, Week 31

A design pattern for how a tool signals it could not return data: a structured, typed null response that lets downstream components explicitly flag gaps, rather than silently omitting fields.

Supply chain attacks on AI dependencies Module 7, Week 36

Threats arising because the model, embedding model, vector database client library, and orchestration framework are all dependencies whose compromise can introduce malicious behavior at inference time.

System Boundary Module 6, Week 29

The explicit statement of what a system receives as input, what it produces as output, and what it explicitly does not do; established before any architecture diagramming.

System prompt Module 1, Week 3

The persistent configuration defining an assistant's persona, rules, and operating constraints, set once and inherited by every turn in the conversation, as distinct from the user prompt which changes per turn.

T

Task tool Module 2/4/5, Week 12/19/26

The Agent SDK mechanism a coordinator uses to spawn a subagent; multiple Task calls in a single coordinator response produce true parallel subagent execution.

task_input_hash / cache_key_fn Module 8, Week 42

A Prefect caching mechanism that hashes task inputs to avoid re-running a task when its inputs haven't changed.

Temperature Module 1, Week 1

A sampling parameter that adjusts how aggressively the model picks lower-probability tokens; controls output variation across otherwise-identical calls.

Test pyramid (three-layer, for AI services) Module 3, Week 16

A testing architecture with Unit tests (mocked API, run on every commit), Contract tests (lock tool schemas, run on every commit), and Integration/regression tests (real API, run only on merge to main).

The 30% cost reduction playbook Module 7, Week 37

Three interventions (prompt caching, model routing by task complexity, output token budget enforcement) that reliably deliver 25–40% AI cost reduction without quality degradation.

Throughput (metric) Module 3, Week 15

The number of requests processed per minute, used as a baseline for anomaly detection.

Thundering herd Module 5, Week 22

The failure mode where many clients that all received a rate-limit response at the same moment all retry simultaneously, re-triggering the same overload; mitigated by jitter.

Tool Module 2, Week 11

A function the model can request be called, specified via a JSON schema with name, description, and input_schema; the model never executes a tool itself, it only requests execution.

tool_choice Module 2/5, Week 11/26

The request parameter controlling how Claude selects tools, with values auto (Claude decides whether to call a tool), any (a tool call is guaranteed but Claude picks which), and forced selection (a specific named tool must run).

tool_executor Module 2, Week 12

The injected callable in the Agent class responsible for dispatching a requested tool name/input to its actual implementation and returning a result string.

tool_result block Module 2, Week 11

A content block sent back to Claude as part of a user-role message, referencing the originating tool_use_id, that returns the output of an executed tool.

Tool Specification (Seven-Field Format) Module 6, Week 30

The production-ready standard for documenting a tool: name, description, input schema, output schema, failure modes, latency profile, and access constraints.

tool_use block Module 2, Week 11

A content block in Claude's response containing a tool name, a unique id, and input arguments, representing a request (not an execution) for the client to run a tool.

tool_use_id Module 2, Week 11

The identifier linking a tool_use request block to its corresponding tool_result block; the most common source of tool-use bugs when mishandled.

Top-p (nucleus sampling) Module 1, Week 2

A sampling parameter that restricts the token vocabulary pool to only those candidates whose cumulative probability reaches the top_p threshold; runs before temperature, which then adjusts the distribution within that restricted pool.

trace_id Module 3/5, Week 15/22

A unique identifier generated at a request's entry point and propagated through every downstream call/log line, enabling reconstruction of a request's complete execution path across services.

Transport (MCP) Module 4, Week 17

The communication layer between MCP client and server; can be local (stdio, the server runs as a local process/subprocess) or remote (SSE/HTTP or StreamableHTTP, the server runs as a network service).

U

Unit test (Layer 1) Module 3, Week 16

A test that mocks the Anthropic API client entirely and verifies application logic without any real API calls, running in under 10 seconds at zero cost.

V

Vector store / vector database Module 4, Week 21

The component that indexes and retrieves content by vector similarity, compared across options (pgvector, Pinecone, Weaviate) on type, hybrid search support, cost model, and operational burden.

Visibility timeout (SQS) Module 8, Week 44

The duration during which a received SQS message is hidden from other consumers while being processed; must exceed Lambda timeout + p99 API latency + write time, or messages get reprocessed as duplicates.

Vocabulary Gap Module 5, Week 27

One of the practice-exam error classifications: recognizing the underlying concept but not recognizing the specific term used to describe it in the question.

X

x-api-key header Module 2, Week 9

The HTTP header used to authenticate a request to the Claude API using the API key credential.

XML tagging (structured output technique) Module 1, Week 6

Asking the model to wrap each output section in named tags (e.g. <classification>, <reason>), then parsing the result with regex or a simple tag extractor; one of two reliable techniques for structured output, alongside JSON output.

Z

Zero-trust architecture Module 7, Week 36

A security model in which no request is implicitly trusted based on network location; every request is authenticated and authorized regardless of whether it originates inside or outside the VPC.

Continue to the Dev Environment Setup Guide, then begin Module 0 — Prerequisite Track.