← Back to Guide

Certification Prep

Study Guide — Complete Knowledge Map

All task statements, knowledge points, and practical skills organized by exam domain.

27%

Domain 1: Agentic Architecture & Orchestration

Task Statement 1.1: Design and implement agentic loops for autonomous task execution

Knowledge of

  • The agentic loop lifecycle: sending requests to Claude, inspecting stop_reason ("tool_use" vs "end_turn"), executing requested tools, and returning results for the next iteration.
  • How tool results are appended to conversation history so the model can reason about the next action.
  • The distinction between model-driven decision-making (Claude reasons about which tool to call next based on context) and pre-configured decision trees or tool sequences.
  • Adding tool results to conversation context between iterations so the model can incorporate new information into its reasoning.

Skills in

  • Implementing agentic loop control flow that continues when stop_reason is "tool_use" and terminates when stop_reason is "end_turn".
  • Avoiding anti-patterns such as parsing natural language signals to determine loop termination, setting arbitrary iteration caps, or checking for assistant text content as a completion indicator.

Exam Tip: The exam heavily tests stop_reason handling. Remember: "tool_use" = keep looping, "end_turn" = done. Never parse text to decide termination.

Task Statement 1.2: Orchestrate multi-agent systems with coordinator-subagent patterns

Knowledge of

  • Hub-and-spoke architecture where a coordinator agent manages all inter-subagent communication, error handling, and information routing.
  • How subagents operate with isolated context—they do not inherit the coordinator's conversation history automatically.
  • The role of the coordinator in task decomposition, delegation, result aggregation, and deciding which subagents to invoke based on query complexity.
  • Risks of overly narrow task decomposition by the coordinator, leading to incomplete coverage of broad research topics.

Skills in

  • Partitioning research scope across subagents to minimize duplication (e.g., assigning distinct subtopics or source types to each agent).
  • Implementing iterative refinement loops where the coordinator evaluates synthesis output for gaps.
  • Routing all subagent communication through the coordinator for observability, consistent error handling, and controlled information flow.

Exam Tip: If a scenario asks "why is coverage incomplete?", check the coordinator's task decomposition first—not downstream agents.

Task Statement 1.3: Configure subagent invocation, context passing, and spawning

Knowledge of

  • The Task tool as the mechanism for spawning subagents, and the requirement that allowedTools must include "Task" for a coordinator to invoke subagents.
  • That subagent context must be explicitly provided in the prompt—subagents do not automatically inherit parent context or share memory between invocations.
  • The AgentDefinition configuration including descriptions, system prompts, and tool restrictions for each subagent type.
  • Fork-based session management for exploring divergent approaches from a shared analysis baseline.

Skills in

  • Including complete findings from prior agents directly in the subagent's prompt (e.g., passing web search results and document analysis outputs to the synthesis subagent).
  • Using structured data formats to separate content from metadata (source URLs, page numbers) when passing context between agents.
  • Spawning parallel subagents by emitting multiple Task tool calls in a single coordinator response.
  • Designing coordinator prompts that specify research goals and quality criteria rather than step-by-step procedural instructions.

Exam Tip: Key exam concept: subagents get NO automatic context. You must pass everything explicitly in their prompt.

Task Statement 1.4: Implement multi-step workflows with enforcement and handoff patterns

Knowledge of

  • The difference between programmatic enforcement (hooks, prerequisite gates) and prompt-based guidance for workflow ordering.
  • When deterministic compliance is required (e.g., identity verification before financial operations), prompt instructions alone have a non-zero failure rate.
  • Structured handoff protocols for mid-process escalation that include customer details, root cause analysis, and recommended actions.

Skills in

  • Implementing programmatic prerequisites that block downstream tool calls until prerequisite steps have completed.
  • Decomposing multi-concern customer requests into distinct items, then investigating each in parallel using shared context before synthesizing a unified resolution.
  • Compiling structured handoff summaries (customer ID, root cause, refund amount, recommended action) when escalating to human agents.

Exam Tip: When the exam says "deterministic" or "guaranteed compliance"—the answer is hooks/programmatic enforcement, not prompts.

Task Statement 1.5: Apply Agent SDK hooks for tool call interception and data normalization

Knowledge of

  • Hook patterns (e.g., PostToolUse) that intercept tool results for transformation before the model processes them.
  • Hook patterns that intercept outgoing tool calls to enforce compliance rules (e.g., blocking refunds above a threshold).
  • The distinction between using hooks for deterministic guarantees versus relying on prompt instructions for probabilistic compliance.

Skills in

  • Implementing PostToolUse hooks to normalize heterogeneous data formats (Unix timestamps, ISO 8601, numeric status codes) from different MCP tools.
  • Implementing tool call interception hooks that block policy-violating actions and redirect to alternative workflows (e.g., human escalation).
  • Choosing hooks over prompt-based enforcement when business rules require guaranteed compliance.

Exam Tip: Hooks = deterministic. Prompts = probabilistic. The exam always prefers hooks when the scenario involves money, compliance, or identity.

Task Statement 1.6: Design task decomposition strategies for complex workflows

Knowledge of

  • When to use fixed sequential pipelines (prompt chaining) versus dynamic adaptive decomposition based on intermediate findings.
  • Prompt chaining patterns that break reviews into sequential steps (e.g., analyze each file individually, then run a cross-file integration pass).
  • The value of adaptive investigation plans that generate subtasks based on what is discovered at each step.

Skills in

  • Selecting task decomposition patterns appropriate to the workflow: prompt chaining for predictable multi-aspect reviews, dynamic decomposition for open-ended investigation tasks.
  • Splitting large code reviews into per-file local analysis passes plus a separate cross-file integration pass to avoid attention dilution.

Exam Tip: Fixed pipeline = predictable tasks (code review). Dynamic decomposition = exploration (legacy codebase investigation).

Task Statement 1.7: Manage session state, resumption, and forking

Knowledge of

  • Named session resumption using --resume <session-name> to continue a specific prior conversation.
  • fork_session for creating independent branches from a shared analysis baseline to explore divergent approaches.
  • Why starting a new session with a structured summary is more reliable than resuming with stale tool results.

Skills in

  • Using --resume with session names to continue named investigation sessions across work sessions.
  • Using fork_session to create parallel exploration branches.
  • Choosing between session resumption (when prior context is mostly valid) and starting fresh with injected summaries (when prior tool results are stale).

Exam Tip: If tool results may be stale, start fresh with a summary—don't blindly --resume.

18%

Domain 2: Tool Design & MCP Integration

Task Statement 2.1: Design effective tool interfaces with clear descriptions and boundaries

Knowledge of

  • Tool descriptions as the primary mechanism LLMs use for tool selection; minimal descriptions lead to unreliable selection among similar tools.
  • The importance of including input formats, example queries, edge cases, and boundary explanations in tool descriptions.
  • How ambiguous or overlapping tool descriptions cause misrouting (e.g., analyze_content vs analyze_document with near-identical descriptions).

Skills in

  • Writing tool descriptions that clearly differentiate each tool's purpose, expected inputs, outputs, and when to use it versus similar alternatives.
  • Renaming tools and updating descriptions to eliminate functional overlap.
  • Splitting generic tools into purpose-specific tools with defined input/output contracts.

Exam Tip: When the exam shows a tool misrouting problem, the first fix is always improving tool descriptions—not adding few-shot examples or routing layers.

Task Statement 2.2: Implement structured error responses for MCP tools

Knowledge of

  • The MCP isError flag pattern for communicating tool failures back to the agent.
  • The distinction between transient errors (timeouts), validation errors (invalid input), business errors (policy violations), and permission errors.
  • Why uniform error responses (generic "Operation failed") prevent the agent from making appropriate recovery decisions.

Skills in

  • Returning structured error metadata including errorCategory (transient/validation/permission), isRetryable boolean, and human-readable descriptions.
  • Including retriable: false flags for business rule violations so the agent can communicate appropriately.
  • Distinguishing between access failures (needing retry decisions) and valid empty results (representing successful queries with no matches).

Exam Tip: Empty results ≠ errors. The exam tests this distinction—a search with no matches is success, not failure.

Task Statement 2.3: Distribute tools appropriately across agents and configure tool choice

Knowledge of

  • The principle that giving an agent access to too many tools (e.g., 18 instead of 4-5) degrades tool selection reliability.
  • Why agents with tools outside their specialization tend to misuse them.
  • Scoped tool access: giving agents only the tools needed for their role, with limited cross-role tools for specific high-frequency needs.
  • tool_choice configuration options: "auto", "any", and forced tool selection ({"type": "tool", "name": "..."}).

Skills in

  • Restricting each subagent's tool set to those relevant to its role.
  • Using tool_choice forced selection to ensure a specific tool is called first (e.g., forcing extract_metadata before enrichment tools).
  • Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist.

Exam Tip: tool_choice: "auto" = model decides. "any" = must use a tool. {"type":"tool","name":"X"} = must use tool X.

Task Statement 2.4: Integrate MCP servers into Claude Code and agent workflows

Knowledge of

  • MCP server scoping: project-level (.mcp.json) for shared team tooling vs user-level (~/.claude.json) for personal/experimental servers.
  • Environment variable expansion in .mcp.json (e.g., ${GITHUB_TOKEN}) for credential management without committing secrets.
  • That tools from all configured MCP servers are discovered at connection time and available simultaneously to the agent.
  • MCP resources as a mechanism for exposing content catalogs (e.g., issue summaries, documentation hierarchies, database schemas).

Skills in

  • Configuring shared MCP servers in project-scoped .mcp.json with environment variable expansion for authentication tokens.
  • Enhancing MCP tool descriptions to explain capabilities and outputs in detail, preventing the agent from preferring built-in tools over more capable MCP tools.
  • Exposing content catalogs as MCP resources to give agents visibility into available data without requiring exploratory tool calls.

Exam Tip: .mcp.json = shared (version controlled). ~/.claude.json = personal. Use env vars for secrets in .mcp.json.

Task Statement 2.5: Select and apply built-in tools (Read, Write, Edit, Bash, Grep, Glob) effectively

Knowledge of

  • Grep for content search (searching file contents for patterns like function names, error messages, or import statements).
  • Glob for file path pattern matching (finding files by name or extension patterns).
  • Read/Write for full file operations; Edit for targeted modifications using unique text matching.
  • When Edit fails due to non-unique text matches, using Read + Write as a fallback.

Skills in

  • Selecting Grep for searching code content across a codebase, Glob for finding files matching naming patterns.
  • Building codebase understanding incrementally: starting with Grep to find entry points, then using Read to follow imports and trace flows.
  • Tracing function usage across wrapper modules by first identifying all exported names, then searching for each name across the codebase.

Exam Tip: Grep = search inside files. Glob = find files by name. The exam tests you on picking the right built-in tool for the task.

20%

Domain 3: Claude Code Configuration & Workflows

Task Statement 3.1: Configure CLAUDE.md files with appropriate hierarchy, scoping, and modular organization

Knowledge of

  • The CLAUDE.md configuration hierarchy: user-level (~/.claude/CLAUDE.md), project-level (.claude/CLAUDE.md or root CLAUDE.md), and directory-level (subdirectory CLAUDE.md files).
  • User-level settings apply only to that user—instructions in ~/.claude/CLAUDE.md are not shared with teammates via version control.
  • The @import syntax for referencing external files to keep CLAUDE.md modular.
  • .claude/rules/ directory for organizing topic-specific rule files as an alternative to a monolithic CLAUDE.md.

Skills in

  • Diagnosing configuration hierarchy issues (e.g., a new team member not receiving instructions because they're in user-level rather than project-level configuration).
  • Using @import to selectively include relevant standards files in each package's CLAUDE.md.
  • Splitting large CLAUDE.md files into focused topic-specific files in .claude/rules/ (e.g., testing.md, api-conventions.md, deployment.md).

Exam Tip: Project-level CLAUDE.md = shared via version control. User-level = personal only. This distinction is heavily tested.

Task Statement 3.2: Create and configure custom slash commands and skills

Knowledge of

  • Project-scoped commands in .claude/commands/ (shared via version control) vs user-scoped commands in ~/.claude/commands/ (personal).
  • Skills in .claude/skills/ with SKILL.md files that support frontmatter configuration including context: fork, allowed-tools, and argument-hint.
  • The context: fork frontmatter option for running skills in an isolated sub-agent context, preventing skill outputs from polluting the main conversation.

Skills in

  • Creating project-scoped slash commands in .claude/commands/ for team-wide availability via version control.
  • Using context: fork to isolate skills that produce verbose output (e.g., codebase analysis) or exploratory context (e.g., brainstorming alternatives).
  • Configuring allowed-tools in skill frontmatter to restrict tool access during skill execution.
  • Using argument-hint frontmatter to prompt developers for required parameters when they invoke the skill without arguments.

Exam Tip: context: fork = isolated execution. allowed-tools = restricted tool access. argument-hint = parameter prompt.

Task Statement 3.3: Apply path-specific rules for conditional convention loading

Knowledge of

  • .claude/rules/ files with YAML frontmatter paths fields containing glob patterns for conditional rule activation.
  • How path-scoped rules load only when editing matching files, reducing irrelevant context and token usage.
  • The advantage of glob-pattern rules over directory-level CLAUDE.md files for conventions that span multiple directories.

Skills in

  • Creating .claude/rules/ files with YAML frontmatter path scoping (e.g., paths: ["src/api/**/*"] for API conventions, paths: ["**/*.test.*"] for testing conventions).
  • Choosing path-specific rules over subdirectory CLAUDE.md files when conventions must apply to files spread across many directories.

Exam Tip: Path-specific rules with glob patterns are ideal for test files (spread everywhere) and API conventions (nested in multiple dirs).

Task Statement 3.4: Determine when to use plan mode vs direct execution

Knowledge of

  • Plan mode is designed for complex tasks involving large-scale changes, multiple valid approaches, and multi-file modifications.
  • Direct execution is appropriate for simple, well-scoped changes (e.g., adding a single validation check to one function).
  • Plan mode enables safe codebase exploration and design before committing to changes, preventing costly rework.
  • The Explore subagent for isolating verbose discovery output and returning summaries to preserve main conversation context.

Skills in

  • Selecting plan mode for tasks with architectural implications (e.g., microservice restructuring, library migrations affecting 45+ files).
  • Selecting direct execution for well-understood changes with clear scope.
  • Combining plan mode for investigation with direct execution for implementation.

Exam Tip: Plan mode = exploration + design before committing. Direct = you already know exactly what to change.

Task Statement 3.5: Apply iterative refinement techniques for progressive improvement

Knowledge of

  • Concrete input/output examples as the most effective way to communicate expected transformations when prose descriptions are interpreted inconsistently.
  • Test-driven iteration: writing test suites first, then iterating by sharing test failures to guide progressive improvement.
  • The interview pattern: having Claude ask questions to surface considerations the developer may not have anticipated.

Skills in

  • Providing 2-3 concrete input/output examples to clarify transformation requirements.
  • Writing test suites covering expected behavior, edge cases, and performance requirements before implementation, then iterating by sharing test failures.
  • Providing specific test cases with example input and expected output to fix edge case handling.

Exam Tip: When prose instructions produce inconsistent results, switch to concrete input/output examples—the exam expects this answer.

Task Statement 3.6: Integrate Claude Code into CI/CD pipelines

Knowledge of

  • The -p (or --print) flag for running Claude Code in non-interactive mode in automated pipelines.
  • --output-format json and --json-schema CLI flags for enforcing structured output in CI contexts.
  • CLAUDE.md as the mechanism for providing project context (testing standards, fixture conventions, review criteria) to CI-invoked Claude Code.
  • Session context isolation: why the same Claude session that generated code is less effective at reviewing its own changes compared to an independent review instance.

Skills in

  • Running Claude Code in CI with the -p flag to prevent interactive input hangs.
  • Using --output-format json with --json-schema to produce machine-parseable structured findings for automated posting as inline PR comments.
  • Including prior review findings in context when re-running reviews after new commits, instructing Claude to report only new or still-unaddressed issues.

Exam Tip: -p = non-interactive (for CI). The exam will present a "pipeline hangs" scenario—answer is always -p.

20%

Domain 4: Prompt Engineering & Structured Output

Task Statement 4.1: Design prompts with explicit criteria to improve precision and reduce false positives

Knowledge of

  • The importance of explicit criteria over vague instructions (e.g., "flag comments only when claimed behavior contradicts actual code behavior" vs "check that comments are accurate").
  • How general instructions like "be conservative" fail to improve precision compared to specific categorical criteria.
  • The impact of false positive rates on developer trust: high false positive categories undermine confidence in accurate categories.

Skills in

  • Writing specific review criteria that define which issues to report versus skip (bugs, security) rather than relying on confidence-based filtering.
  • Temporarily disabling high false-positive categories to restore developer trust while improving prompts for those categories.
  • Defining explicit severity criteria with concrete code examples for each severity level to achieve consistent classification.

Exam Tip: The exam penalizes vague prompts. "Be conservative" is always wrong—replace with explicit, categorical criteria.

Task Statement 4.2: Apply few-shot prompting to improve output consistency and quality

Knowledge of

  • Few-shot examples as the most effective technique for achieving consistently formatted, actionable output when detailed instructions alone produce inconsistent results.
  • The role of few-shot examples in demonstrating ambiguous-case handling (e.g., tool selection for ambiguous requests, branch-level test coverage gaps).
  • How few-shot examples enable the model to generalize judgment to novel patterns rather than matching only pre-specified cases.

Skills in

  • Creating 2-4 targeted few-shot examples for ambiguous scenarios that show reasoning for why one action was chosen over plausible alternatives.
  • Including few-shot examples that demonstrate specific desired output format (location, issue, severity, suggested fix) to achieve consistency.
  • Providing few-shot examples distinguishing acceptable code patterns from genuine issues to reduce false positives while enabling generalization.

Exam Tip: Few-shot examples should show reasoning ("why this choice"), not just the final answer. Include 1-2 ambiguous cases.

Task Statement 4.3: Enforce structured output using tool use and JSON schemas

Knowledge of

  • Tool use (tool_use) with JSON schemas as the most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors.
  • The distinction between tool_choice: "auto" (model may return text), "any" (model must call a tool but can choose which), and forced tool selection.
  • That strict JSON schemas via tool use eliminate syntax errors but do not prevent semantic errors (e.g., line items that don't sum to total).
  • Schema design considerations: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories.

Skills in

  • Defining extraction tools with JSON schemas as input parameters and extracting structured data from the tool_use response.
  • Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown.
  • Forcing a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps.
  • Designing schema fields as optional (nullable) when source documents may not contain the information, preventing the model from fabricating values.

Exam Tip: tool_use + JSON schema = guaranteed valid JSON. But validate semantic correctness separately (e.g., totals, dates, cross-field consistency).

Task Statement 4.4: Implement validation, retry, and feedback loops for extraction quality

Knowledge of

  • Retry-with-error-feedback: appending specific validation errors to the prompt on retry to guide the model toward correction.
  • The limits of retry: retries are ineffective when the required information is simply absent from the source document.
  • Feedback loop design: tracking which code constructs trigger findings (detected_pattern field) to enable systematic analysis of dismissal patterns.

Skills in

  • Implementing follow-up requests that include the original document, the failed extraction, and specific validation errors for model self-correction.
  • Identifying when retries will be ineffective (information exists only in an external document not provided) versus when they will succeed (format mismatches, structural output errors).
  • Adding detected_pattern fields to structured findings to enable analysis of false positive patterns when developers dismiss findings.

Exam Tip: Retry works for format errors, not missing information. If the data isn't in the source document, retrying won't help.

Task Statement 4.5: Design efficient batch processing strategies

Knowledge of

  • The Message Batches API: 50% cost savings, up to 24-hour processing window, no guaranteed latency SLA.
  • Batch processing is appropriate for non-blocking, latency-tolerant workloads (overnight reports, weekly audits) and inappropriate for blocking workflows (pre-merge checks).
  • The batch API does not support multi-turn tool calling within a single request.
  • custom_id fields for correlating batch request/response pairs.

Skills in

  • Matching API approach to workflow latency requirements: synchronous API for blocking pre-merge checks, batch API for overnight/weekly analysis.
  • Calculating batch submission frequency based on SLA constraints (e.g., 4-hour windows to guarantee 30-hour SLA with 24-hour batch processing).
  • Handling batch failures: resubmitting only failed documents (identified by custom_id) with appropriate modifications.

Exam Tip: Batch API = 50% cheaper but up to 24h. Never use batch for blocking workflows (pre-merge checks). Always use for overnight jobs.

Task Statement 4.6: Design multi-instance and multi-pass review architectures

Knowledge of

  • Self-review limitations: a model retains reasoning context from generation, making it less likely to question its own decisions in the same session.
  • Independent review instances (without prior reasoning context) are more effective at catching subtle issues than self-review instructions or extended thinking.
  • Multi-pass review: splitting large reviews into per-file local analysis passes plus cross-file integration passes to avoid attention dilution.

Skills in

  • Using a second independent Claude instance to review generated code without the generator's reasoning context.
  • Splitting large multi-file reviews into focused per-file passes for local issues plus separate integration passes for cross-file data flow analysis.
  • Running verification passes where the model self-reports confidence alongside each finding to enable calibrated review routing.

Exam Tip: Self-review is unreliable—always use a separate instance. Split large reviews into per-file + cross-file integration passes.

15%

Domain 5: Context Management & Reliability

Task Statement 5.1: Manage conversation context to preserve critical information across long interactions

Knowledge of

  • Progressive summarization risks: condensing numerical values, percentages, dates, and customer-stated expectations into vague summaries.
  • The "lost in the middle" effect: models reliably process information at the beginning and end of long inputs but may omit findings from middle sections.
  • How tool results accumulate in context and consume tokens disproportionately to their relevance.
  • The importance of passing complete conversation history in subsequent API requests to maintain conversational coherence.

Skills in

  • Extracting transactional facts (amounts, dates, order numbers, statuses) into a persistent "case facts" block included in each prompt, outside summarized history.
  • Trimming verbose tool outputs to only relevant fields before they accumulate in context.
  • Placing key findings summaries at the beginning of aggregated inputs and organizing detailed results with explicit section headers to mitigate position effects.
  • Requiring subagents to include metadata (dates, source locations, methodological context) in structured outputs to support accurate downstream synthesis.

Exam Tip: Extract key facts into a structured block at the top of the prompt. Don't rely on the model remembering facts from the middle of a long history.

Task Statement 5.2: Design effective escalation and ambiguity resolution patterns

Knowledge of

  • Appropriate escalation triggers: customer requests for a human, policy exceptions/gaps, and inability to make meaningful progress.
  • The distinction between escalating immediately when a customer explicitly demands it versus offering to resolve when the issue is straightforward.
  • Why sentiment-based escalation and self-reported confidence scores are unreliable proxies for actual case complexity.
  • How multiple customer matches require clarification (requesting additional identifiers) rather than heuristic selection.

Skills in

  • Adding explicit escalation criteria with few-shot examples demonstrating when to escalate versus resolve autonomously.
  • Honoring explicit customer requests for human agents immediately without first attempting investigation.
  • Escalating when policy is ambiguous or silent on the customer's specific request.

Exam Tip: Sentiment analysis and confidence scores are always wrong answers for escalation calibration. Use explicit criteria with examples.

Task Statement 5.3: Implement error propagation strategies across multi-agent systems

Knowledge of

  • Structured error context (failure type, attempted query, partial results, alternative approaches) as enabling intelligent coordinator recovery decisions.
  • The distinction between access failures (timeouts needing retry decisions) and valid empty results (successful queries with no matches).
  • Why generic error statuses ("search unavailable") hide valuable context from the coordinator.
  • Why silently suppressing errors (returning empty results as success) or terminating entire workflows on single failures are both anti-patterns.

Skills in

  • Returning structured error context including failure type, what was attempted, partial results, and potential alternatives to enable coordinator recovery.
  • Having subagents implement local recovery for transient failures and only propagate errors they cannot resolve locally along with partial results.
  • Structuring synthesis output with coverage annotations indicating which findings are well-supported versus which topic areas have gaps due to unavailable sources.

Exam Tip: Errors must include structured context—failure type + what was tried + partial results. Generic "failed" messages are always wrong.

Task Statement 5.4: Manage context effectively in large codebase exploration

Knowledge of

  • Context degradation in extended sessions: models start giving inconsistent answers and referencing "typical patterns" rather than specific classes discovered earlier.
  • The role of scratchpad files for persisting key findings across context boundaries.
  • Subagent delegation for isolating verbose exploration output while the main agent coordinates high-level understanding.
  • Structured state persistence for crash recovery: each agent exports state to a known location, and the coordinator loads a manifest on resume.

Skills in

  • Spawning subagents to investigate specific questions while the main agent preserves high-level coordination.
  • Having agents maintain scratchpad files recording key findings, referencing them for subsequent questions to counteract context degradation.
  • Summarizing key findings from one exploration phase before spawning sub-agents for the next phase, injecting summaries into initial context.
  • Using /compact to reduce context usage during extended exploration sessions when context fills with verbose discovery output.

Exam Tip: When context degrades, use scratchpad files + subagent delegation. /compact helps but loses detail—prefer structured persistence.

Task Statement 5.5: Design human review workflows and confidence calibration

Knowledge of

  • The risk that aggregate accuracy metrics (e.g., 97% overall) may mask poor performance on specific document types or fields.
  • Stratified random sampling for measuring error rates in high-confidence extractions and detecting novel error patterns.
  • Field-level confidence scores calibrated using labeled validation sets for routing review attention.

Skills in

  • Implementing stratified random sampling of high-confidence extractions for ongoing error rate measurement and novel pattern detection.
  • Analyzing accuracy by document type and field to verify consistent performance across all segments before reducing human review.
  • Routing extractions with low model confidence or ambiguous/contradictory source documents to human review, prioritizing limited reviewer capacity.

Exam Tip: High aggregate accuracy can hide failures in specific document types. Always stratify by document type and field when evaluating performance.

Task Statement 5.6: Preserve information provenance and handle uncertainty in multi-source synthesis

Knowledge of

  • How source attribution is lost during summarization steps when findings are compressed without preserving claim-source mappings.
  • The importance of structured claim-source mappings that the synthesis agent must preserve and merge when combining findings.
  • How to handle conflicting statistics from credible sources: annotating conflicts with source attribution rather than arbitrarily selecting one value.
  • Temporal data: requiring publication/collection dates in structured outputs to prevent temporal differences from being misinterpreted as contradictions.

Skills in

  • Requiring subagents to output structured claim-source mappings (source URLs, document names, relevant excerpts) that downstream agents preserve through synthesis.
  • Structuring reports with explicit sections distinguishing well-established findings from contested ones, preserving original source characterizations.
  • Completing document analysis with conflicting values included and explicitly annotated, letting the coordinator decide how to reconcile before passing to synthesis.
  • Requiring subagents to include publication or data collection dates in structured outputs to enable correct temporal interpretation.

Exam Tip: Claims must always carry source attribution through the entire pipeline. If synthesis loses sources, the architecture is broken.