Certification Prep

Study Guide — Complete Knowledge Map

All task statements, knowledge points, and practical skills organized by exam domain.

27%

Domain 1: Agentic Architecture & Orchestration

Task Statement 1.1: Design and implement agentic loops for autonomous task execution

Knowledge of

•The agentic loop lifecycle: sending requests to Claude, inspecting stop_reason ("tool_use" vs "end_turn"), executing requested tools, and returning results for the next iteration.
•How tool results are appended to conversation history so the model can reason about the next action.
•The distinction between model-driven decision-making (Claude reasons about which tool to call next based on context) and pre-configured decision trees or tool sequences.
•Adding tool results to conversation context between iterations so the model can incorporate new information into its reasoning.

Skills in

▸Implementing agentic loop control flow that continues when stop_reason is "tool_use" and terminates when stop_reason is "end_turn".
▸Avoiding anti-patterns such as parsing natural language signals to determine loop termination, setting arbitrary iteration caps, or checking for assistant text content as a completion indicator.

Exam Tip: The exam heavily tests stop_reason handling. Remember: "tool_use" = keep looping, "end_turn" = done. Never parse text to decide termination.

Task Statement 1.2: Orchestrate multi-agent systems with coordinator-subagent patterns

Knowledge of

•Hub-and-spoke architecture where a coordinator agent manages all inter-subagent communication, error handling, and information routing.
•How subagents operate with isolated context—they do not inherit the coordinator's conversation history automatically.
•The role of the coordinator in task decomposition, delegation, result aggregation, and deciding which subagents to invoke based on query complexity.
•Risks of overly narrow task decomposition by the coordinator, leading to incomplete coverage of broad research topics.

Skills in

▸Partitioning research scope across subagents to minimize duplication (e.g., assigning distinct subtopics or source types to each agent).
▸Implementing iterative refinement loops where the coordinator evaluates synthesis output for gaps.
▸Routing all subagent communication through the coordinator for observability, consistent error handling, and controlled information flow.

Exam Tip: If a scenario asks "why is coverage incomplete?", check the coordinator's task decomposition first—not downstream agents.

Task Statement 1.3: Configure subagent invocation, context passing, and spawning

Knowledge of

•The Task tool as the mechanism for spawning subagents, and the requirement that allowedTools must include "Task" for a coordinator to invoke subagents.
•That subagent context must be explicitly provided in the prompt—subagents do not automatically inherit parent context or share memory between invocations.
•The AgentDefinition configuration including descriptions, system prompts, and tool restrictions for each subagent type.
•Fork-based session management for exploring divergent approaches from a shared analysis baseline.

Skills in

▸Including complete findings from prior agents directly in the subagent's prompt (e.g., passing web search results and document analysis outputs to the synthesis subagent).
▸Using structured data formats to separate content from metadata (source URLs, page numbers) when passing context between agents.
▸Spawning parallel subagents by emitting multiple Task tool calls in a single coordinator response.
▸Designing coordinator prompts that specify research goals and quality criteria rather than step-by-step procedural instructions.

Exam Tip: Key exam concept: subagents get NO automatic context. You must pass everything explicitly in their prompt.

Task Statement 1.4: Implement multi-step workflows with enforcement and handoff patterns

Knowledge of

•The difference between programmatic enforcement (hooks, prerequisite gates) and prompt-based guidance for workflow ordering.
•When deterministic compliance is required (e.g., identity verification before financial operations), prompt instructions alone have a non-zero failure rate.
•Structured handoff protocols for mid-process escalation that include customer details, root cause analysis, and recommended actions.

Skills in

▸Implementing programmatic prerequisites that block downstream tool calls until prerequisite steps have completed.
▸Decomposing multi-concern customer requests into distinct items, then investigating each in parallel using shared context before synthesizing a unified resolution.
▸Compiling structured handoff summaries (customer ID, root cause, refund amount, recommended action) when escalating to human agents.

Exam Tip: When the exam says "deterministic" or "guaranteed compliance"—the answer is hooks/programmatic enforcement, not prompts.

Task Statement 1.5: Apply Agent SDK hooks for tool call interception and data normalization

Knowledge of

•Hook patterns (e.g., PostToolUse) that intercept tool results for transformation before the model processes them.
•Hook patterns that intercept outgoing tool calls to enforce compliance rules (e.g., blocking refunds above a threshold).
•The distinction between using hooks for deterministic guarantees versus relying on prompt instructions for probabilistic compliance.

Skills in

▸Implementing PostToolUse hooks to normalize heterogeneous data formats (Unix timestamps, ISO 8601, numeric status codes) from different MCP tools.
▸Implementing tool call interception hooks that block policy-violating actions and redirect to alternative workflows (e.g., human escalation).
▸Choosing hooks over prompt-based enforcement when business rules require guaranteed compliance.

Exam Tip: Hooks = deterministic. Prompts = probabilistic. The exam always prefers hooks when the scenario involves money, compliance, or identity.

Task Statement 1.6: Design task decomposition strategies for complex workflows

Knowledge of

•When to use fixed sequential pipelines (prompt chaining) versus dynamic adaptive decomposition based on intermediate findings.
•Prompt chaining patterns that break reviews into sequential steps (e.g., analyze each file individually, then run a cross-file integration pass).
•The value of adaptive investigation plans that generate subtasks based on what is discovered at each step.

Skills in

▸Selecting task decomposition patterns appropriate to the workflow: prompt chaining for predictable multi-aspect reviews, dynamic decomposition for open-ended investigation tasks.
▸Splitting large code reviews into per-file local analysis passes plus a separate cross-file integration pass to avoid attention dilution.

Exam Tip: Fixed pipeline = predictable tasks (code review). Dynamic decomposition = exploration (legacy codebase investigation).

Task Statement 1.7: Manage session state, resumption, and forking

Knowledge of

•Named session resumption using --resume <session-name> to continue a specific prior conversation.
•fork_session for creating independent branches from a shared analysis baseline to explore divergent approaches.
•Why starting a new session with a structured summary is more reliable than resuming with stale tool results.

Skills in

▸Using --resume with session names to continue named investigation sessions across work sessions.
▸Using fork_session to create parallel exploration branches.
▸Choosing between session resumption (when prior context is mostly valid) and starting fresh with injected summaries (when prior tool results are stale).

Exam Tip: If tool results may be stale, start fresh with a summary—don't blindly --resume.

18%

Domain 2: Tool Design & MCP Integration

Task Statement 2.1: Design effective tool interfaces with clear descriptions and boundaries

Knowledge of

•Tool descriptions as the primary mechanism LLMs use for tool selection; minimal descriptions lead to unreliable selection among similar tools.
•The importance of including input formats, example queries, edge cases, and boundary explanations in tool descriptions.
•How ambiguous or overlapping tool descriptions cause misrouting (e.g., analyze_content vs analyze_document with near-identical descriptions).

Skills in

▸Writing tool descriptions that clearly differentiate each tool's purpose, expected inputs, outputs, and when to use it versus similar alternatives.
▸Renaming tools and updating descriptions to eliminate functional overlap.
▸Splitting generic tools into purpose-specific tools with defined input/output contracts.

Exam Tip: When the exam shows a tool misrouting problem, the first fix is always improving tool descriptions—not adding few-shot examples or routing layers.

Task Statement 2.2: Implement structured error responses for MCP tools

Knowledge of

•The MCP isError flag pattern for communicating tool failures back to the agent.
•The distinction between transient errors (timeouts), validation errors (invalid input), business errors (policy violations), and permission errors.
•Why uniform error responses (generic "Operation failed") prevent the agent from making appropriate recovery decisions.

Skills in

▸Returning structured error metadata including errorCategory (transient/validation/permission), isRetryable boolean, and human-readable descriptions.
▸Including retriable: false flags for business rule violations so the agent can communicate appropriately.
▸Distinguishing between access failures (needing retry decisions) and valid empty results (representing successful queries with no matches).

Exam Tip: Empty results ≠ errors. The exam tests this distinction—a search with no matches is success, not failure.

Task Statement 2.3: Distribute tools appropriately across agents and configure tool choice

Knowledge of

•The principle that giving an agent access to too many tools (e.g., 18 instead of 4-5) degrades tool selection reliability.
•Why agents with tools outside their specialization tend to misuse them.
•Scoped tool access: giving agents only the tools needed for their role, with limited cross-role tools for specific high-frequency needs.
•tool_choice configuration options: "auto", "any", and forced tool selection ({"type": "tool", "name": "..."}).

Skills in

▸Restricting each subagent's tool set to those relevant to its role.
▸Using tool_choice forced selection to ensure a specific tool is called first (e.g., forcing extract_metadata before enrichment tools).
▸Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist.

Exam Tip: tool_choice: "auto" = model decides. "any" = must use a tool. {"type":"tool","name":"X"} = must use tool X.

Task Statement 2.4: Integrate MCP servers into Claude Code and agent workflows

Knowledge of

•MCP server scoping: project-level (.mcp.json) for shared team tooling vs user-level (~/.claude.json) for personal/experimental servers.
•Environment variable expansion in .mcp.json (e.g., ${GITHUB_TOKEN}) for credential management without committing secrets.
•That tools from all configured MCP servers are discovered at connection time and available simultaneously to the agent.
•MCP resources as a mechanism for exposing content catalogs (e.g., issue summaries, documentation hierarchies, database schemas).

Skills in

▸Configuring shared MCP servers in project-scoped .mcp.json with environment variable expansion for authentication tokens.
▸Enhancing MCP tool descriptions to explain capabilities and outputs in detail, preventing the agent from preferring built-in tools over more capable MCP tools.
▸Exposing content catalogs as MCP resources to give agents visibility into available data without requiring exploratory tool calls.

Exam Tip: .mcp.json = shared (version controlled). ~/.claude.json = personal. Use env vars for secrets in .mcp.json.

Task Statement 2.5: Select and apply built-in tools (Read, Write, Edit, Bash, Grep, Glob) effectively

Knowledge of

•Grep for content search (searching file contents for patterns like function names, error messages, or import statements).
•Glob for file path pattern matching (finding files by name or extension patterns).
•Read/Write for full file operations; Edit for targeted modifications using unique text matching.
•When Edit fails due to non-unique text matches, using Read + Write as a fallback.

Skills in

▸Selecting Grep for searching code content across a codebase, Glob for finding files matching naming patterns.
▸Building codebase understanding incrementally: starting with Grep to find entry points, then using Read to follow imports and trace flows.
▸Tracing function usage across wrapper modules by first identifying all exported names, then searching for each name across the codebase.

Exam Tip: Grep = search inside files. Glob = find files by name. The exam tests you on picking the right built-in tool for the task.

20%

Domain 3: Claude Code Configuration & Workflows

Task Statement 3.1: Configure CLAUDE.md files with appropriate hierarchy, scoping, and modular organization

Knowledge of

•The CLAUDE.md configuration hierarchy: user-level (~/.claude/CLAUDE.md), project-level (.claude/CLAUDE.md or root CLAUDE.md), and directory-level (subdirectory CLAUDE.md files).
•User-level settings apply only to that user—instructions in ~/.claude/CLAUDE.md are not shared with teammates via version control.
•The @import syntax for referencing external files to keep CLAUDE.md modular.
•.claude/rules/ directory for organizing topic-specific rule files as an alternative to a monolithic CLAUDE.md.

Skills in

▸Diagnosing configuration hierarchy issues (e.g., a new team member not receiving instructions because they're in user-level rather than project-level configuration).
▸Using @import to selectively include relevant standards files in each package's CLAUDE.md.
▸Splitting large CLAUDE.md files into focused topic-specific files in .claude/rules/ (e.g., testing.md, api-conventions.md, deployment.md).

Exam Tip: Project-level CLAUDE.md = shared via version control. User-level = personal only. This distinction is heavily tested.

Task Statement 3.2: Create and configure custom slash commands and skills

Knowledge of

•Project-scoped commands in .claude/commands/ (shared via version control) vs user-scoped commands in ~/.claude/commands/ (personal).
•Skills in .claude/skills/ with SKILL.md files that support frontmatter configuration including context: fork, allowed-tools, and argument-hint.
•The context: fork frontmatter option for running skills in an isolated sub-agent context, preventing skill outputs from polluting the main conversation.

Skills in

▸Creating project-scoped slash commands in .claude/commands/ for team-wide availability via version control.
▸Using context: fork to isolate skills that produce verbose output (e.g., codebase analysis) or exploratory context (e.g., brainstorming alternatives).
▸Configuring allowed-tools in skill frontmatter to restrict tool access during skill execution.
▸Using argument-hint frontmatter to prompt developers for required parameters when they invoke the skill without arguments.

Exam Tip: context: fork = isolated execution. allowed-tools = restricted tool access. argument-hint = parameter prompt.

Task Statement 3.3: Apply path-specific rules for conditional convention loading

Knowledge of

•.claude/rules/ files with YAML frontmatter paths fields containing glob patterns for conditional rule activation.
•How path-scoped rules load only when editing matching files, reducing irrelevant context and token usage.
•The advantage of glob-pattern rules over directory-level CLAUDE.md files for conventions that span multiple directories.

Skills in

▸Creating .claude/rules/ files with YAML frontmatter path scoping (e.g., paths: ["src/api/**/*"] for API conventions, paths: ["**/*.test.*"] for testing conventions).
▸Choosing path-specific rules over subdirectory CLAUDE.md files when conventions must apply to files spread across many directories.

Exam Tip: Path-specific rules with glob patterns are ideal for test files (spread everywhere) and API conventions (nested in multiple dirs).

Task Statement 3.4: Determine when to use plan mode vs direct execution

Knowledge of

•Plan mode is designed for complex tasks involving large-scale changes, multiple valid approaches, and multi-file modifications.
•Direct execution is appropriate for simple, well-scoped changes (e.g., adding a single validation check to one function).
•Plan mode enables safe codebase exploration and design before committing to changes, preventing costly rework.
•The Explore subagent for isolating verbose discovery output and returning summaries to preserve main conversation context.

Skills in

▸Selecting plan mode for tasks with architectural implications (e.g., microservice restructuring, library migrations affecting 45+ files).
▸Selecting direct execution for well-understood changes with clear scope.
▸Combining plan mode for investigation with direct execution for implementation.

Exam Tip: Plan mode = exploration + design before committing. Direct = you already know exactly what to change.

Task Statement 3.5: Apply iterative refinement techniques for progressive improvement

Knowledge of

•Concrete input/output examples as the most effective way to communicate expected transformations when prose descriptions are interpreted inconsistently.
•Test-driven iteration: writing test suites first, then iterating by sharing test failures to guide progressive improvement.
•The interview pattern: having Claude ask questions to surface considerations the developer may not have anticipated.

Skills in

▸Providing 2-3 concrete input/output examples to clarify transformation requirements.
▸Writing test suites covering expected behavior, edge cases, and performance requirements before implementation, then iterating by sharing test failures.
▸Providing specific test cases with example input and expected output to fix edge case handling.

Exam Tip: When prose instructions produce inconsistent results, switch to concrete input/output examples—the exam expects this answer.

Task Statement 3.6: Integrate Claude Code into CI/CD pipelines

Knowledge of

•The -p (or --print) flag for running Claude Code in non-interactive mode in automated pipelines.
•--output-format json and --json-schema CLI flags for enforcing structured output in CI contexts.
•CLAUDE.md as the mechanism for providing project context (testing standards, fixture conventions, review criteria) to CI-invoked Claude Code.
•Session context isolation: why the same Claude session that generated code is less effective at reviewing its own changes compared to an independent review instance.

Skills in

▸Running Claude Code in CI with the -p flag to prevent interactive input hangs.
▸Using --output-format json with --json-schema to produce machine-parseable structured findings for automated posting as inline PR comments.
▸Including prior review findings in context when re-running reviews after new commits, instructing Claude to report only new or still-unaddressed issues.

Exam Tip: -p = non-interactive (for CI). The exam will present a "pipeline hangs" scenario—answer is always -p.

20%

Domain 4: Prompt Engineering & Structured Output

Task Statement 4.1: Design prompts with explicit criteria to improve precision and reduce false positives

Knowledge of

•The importance of explicit criteria over vague instructions (e.g., "flag comments only when claimed behavior contradicts actual code behavior" vs "check that comments are accurate").
•How general instructions like "be conservative" fail to improve precision compared to specific categorical criteria.
•The impact of false positive rates on developer trust: high false positive categories undermine confidence in accurate categories.

Skills in

▸Writing specific review criteria that define which issues to report versus skip (bugs, security) rather than relying on confidence-based filtering.
▸Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories.
▸Defining explicit severity criteria with concrete code examples for each severity level to achieve consistent classification.

Exam Tip: The exam penalizes vague prompts. "Be conservative" is always wrong—replace with explicit, categorical criteria.

Task Statement 4.2: Apply few-shot prompting to improve output consistency and quality

Knowledge of

•Few-shot examples as the most effective technique for achieving consistently formatted, actionable output when detailed instructions alone produce inconsistent results.
•The role of few-shot examples in demonstrating ambiguous-case handling (e.g., tool selection for ambiguous requests, branch-level test coverage gaps).
•How few-shot examples enable the model to generalize judgment to novel patterns rather than matching only pre-specified cases.

Skills in

▸Creating 2-4 targeted few-shot examples for ambiguous scenarios that show reasoning for why one action was chosen over plausible alternatives.
▸Including few-shot examples that demonstrate specific desired output format (location, issue, severity, suggested fix) to achieve consistency.
▸Providing few-shot examples distinguishing acceptable code patterns from genuine issues to reduce false positives while enabling generalization.

Exam Tip: Few-shot examples should show reasoning ("why this choice"), not just the final answer. Include 1-2 ambiguous cases.

Task Statement 4.3: Enforce structured output using tool use and JSON schemas

Knowledge of

•Tool use (tool_use) with JSON schemas as the most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors.
•The distinction between tool_choice: "auto" (model may return text), "any" (model must call a tool but can choose which), and forced tool selection.
•That strict JSON schemas via tool use eliminate syntax errors but do not prevent semantic errors (e.g., line items that don't sum to total).
•Schema design considerations: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories.

Skills in

▸Defining extraction tools with JSON schemas as input parameters and extracting structured data from the tool_use response.
▸Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown.
▸Forcing a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps.
▸Designing schema fields as optional (nullable) when source documents may not contain the information, preventing the model from fabricating values.

Exam Tip: tool_use + JSON schema = guaranteed valid JSON. But validate semantic correctness separately (e.g., totals, dates, cross-field consistency).

Task Statement 4.4: Implement validation, retry, and feedback loops for extraction quality

Knowledge of

•Retry-with-error-feedback: appending specific validation errors to the prompt on retry to guide the model toward correction.
•The limits of retry: retries are ineffective when the required information is simply absent from the source document.
•Feedback loop design: tracking which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns.

Skills in

▸Implementing follow-up requests that include the original document, the failed extraction, and specific validation errors for model self-correction.
▸Identifying when retries will be ineffective (information exists only in an external document not provided) versus when they will succeed (format mismatches, structural output errors).
▸Adding detected_pattern fields to structured findings to enable analysis of false positive patterns when developers dismiss findings.

Exam Tip: Retry works for format errors, not missing information. If the data isn't in the source document, retrying won't help.

Task Statement 4.5: Design efficient batch processing strategies

Knowledge of

•The Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA.
•Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits) and inappropriate for blocking workflows (pre-merge checks).
•The batch API does not support multi-turn tool calling within a single request.
•custom_id fields for correlating batch request/response pairs.

Skills in

▸Matching API approach to workflow latency requirements: synchronous API for blocking pre-merge checks, batch API for overnight/weekly analysis.
▸Calculating batch submission frequency based on SLA constraints (e.g., 4-hour windows to guarantee 30-hour SLA with 24-hour batch processing).
▸Handling batch failures: resubmitting only failed documents (identified by custom_id) with appropriate modifications.

Exam Tip: Batch API = 50% cheaper but up to 24h. Never use batch for blocking workflows (pre-merge checks). Always use for overnight jobs.

Task Statement 4.6: Design multi-instance and multi-pass review architectures

Knowledge of

•Self-review limitations: a model retains reasoning context from generation, making it less likely to question its own decisions in the same session.
•Independent review instances (without prior reasoning context) are more effective at catching subtle issues than self-review instructions or extended thinking.
•Multi-pass review: splitting large reviews into per-file local analysis passes plus cross-file integration passes to avoid attention dilution.

Skills in

▸Using a second independent Claude instance to review generated code without the generator's reasoning context.
▸Splitting large multi-file reviews into focused per-file passes for local issues plus separate integration passes for cross-file data flow analysis.
▸Running verification passes where the model self-reports confidence alongside each finding to enable calibrated review routing.

Exam Tip: Self-review is unreliable—always use a separate instance. Split large reviews into per-file + cross-file integration passes.

15%

Domain 5: Context Management & Reliability

Task Statement 5.1: Manage conversation context to preserve critical information across long interactions

Knowledge of

•Progressive summarization risks: condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries.
•The "lost in the middle" effect: models reliably process information at the beginning and end of long inputs but may omit findings from middle sections.
•How tool results accumulate in context and consume tokens disproportionately to their relevance.
•The importance of passing complete conversation history in subsequent API requests to maintain conversational coherence.

Skills in

▸Extracting transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt, outside summarized history.
▸Trimming verbose tool outputs to only relevant fields before they accumulate in context.
▸Placing key findings summaries at the beginning of aggregated inputs and organizing detailed results with explicit section headers to mitigate position effects.
▸Requiring subagents to include metadata (dates, source locations, methodological context) in structured outputs to support accurate downstream synthesis.

Exam Tip: Extract key facts into a structured block at the top of the prompt. Don't rely on the model remembering facts from the middle of a long history.

Task Statement 5.2: Design effective escalation and ambiguity resolution patterns

Knowledge of

•Appropriate escalation triggers: customer requests for a human, policy exceptions/gaps, and inability to make meaningful progress.
•The distinction between escalating immediately when a customer explicitly demands it versus offering to resolve when the issue is straightforward.
•Why sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity.
•How multiple customer matches require clarification (requesting additional identifiers) rather than heuristic selection.

Skills in

▸Adding explicit escalation criteria with few-shot examples demonstrating when to escalate versus resolve autonomously.
▸Honoring explicit customer requests for human agents immediately without first attempting investigation.
▸Escalating when policy is ambiguous or silent on the customer's specific request.

Exam Tip: Sentiment analysis and confidence scores are always wrong answers for escalation calibration. Use explicit criteria with examples.

Task Statement 5.3: Implement error propagation strategies across multi-agent systems

Knowledge of

•Structured error context (failure type, attempted query, partial results, alternative approaches) as enabling intelligent coordinator recovery decisions.
•The distinction between access failures (timeouts needing retry decisions) and valid empty results (successful queries with no matches).
•Why generic error statuses ("search unavailable") hide valuable context from the coordinator.
•Why silently suppressing errors (returning empty results as success) or terminating entire workflows on single failures are both anti-patterns.

Skills in

▸Returning structured error context including failure type, what was attempted, partial results, and potential alternatives to enable coordinator recovery.
▸Having subagents implement local recovery for transient failures and only propagate errors they cannot resolve locally along with partial results.
▸Structuring synthesis output with coverage annotations indicating which findings are well-supported versus which topic areas have gaps due to unavailable sources.

Exam Tip: Errors must include structured context—failure type + what was tried + partial results. Generic "failed" messages are always wrong.

Task Statement 5.4: Manage context effectively in large codebase exploration

Knowledge of

•Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier.
•The role of scratchpad files for persisting key findings across context boundaries.
•Subagent delegation for isolating verbose exploration output while the main agent coordinates high-level understanding.
•Structured state persistence for crash recovery: each agent exports state to a known location, and the coordinator loads a manifest on resume.

Skills in

▸Spawning subagents to investigate specific questions while the main agent preserves high-level coordination.
▸Having agents maintain scratchpad files recording key findings, referencing them for subsequent questions to counteract context degradation.
▸Summarizing key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context.
▸Using /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output.

Exam Tip: When context degrades, use scratchpad files + subagent delegation. /compact helps but loses detail—prefer structured persistence.

Task Statement 5.5: Design human review workflows and confidence calibration

Knowledge of

•The risk that aggregate accuracy metrics (e.g., 97% overall) may mask poor performance on specific document types or fields.
•Stratified random sampling for measuring error rates in high-confidence extractions and detecting novel error patterns.
•Field-level confidence scores calibrated using labeled validation sets for routing review attention.

Skills in

▸Implementing stratified random sampling of high-confidence extractions for ongoing error rate measurement and novel pattern detection.
▸Analyzing accuracy by document type and field to verify consistent performance across all segments before reducing human review.
▸Routing extractions with low model confidence or ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity.

Exam Tip: High aggregate accuracy can hide failures in specific document types. Always stratify by document type and field when evaluating performance.

Task Statement 5.6: Preserve information provenance and handle uncertainty in multi-source synthesis

Knowledge of

•How source attribution is lost during summarization steps when findings are compressed without preserving claim-source mappings.
•The importance of structured claim-source mappings that the synthesis agent must preserve and merge when combining findings.
•How to handle conflicting statistics from credible sources: annotating conflicts with source attribution rather than arbitrarily selecting one value.
•Temporal data: requiring publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions.

Skills in

▸Requiring subagents to output structured claim-source mappings (source URLs, document names, relevant excerpts) that downstream agents preserve through synthesis.
▸Structuring reports with explicit sections distinguishing well-established findings from contested ones, preserving original source characterizations.
▸Completing document analysis with conflicting values included and explicitly annotated, letting the coordinator decide how to reconcile before passing to synthesis.
▸Requiring subagents to include publication or data collection dates in structured outputs to enable correct temporal interpretation.

Exam Tip: Claims must always carry source attribution through the entire pipeline. If synthesis loses sources, the architecture is broken.

Take the quiz →