InterviewSkill

AI Agents Interview Questions

Tool use, planning, workflows, memory, and guardrails for agentic AI systems.

33 questions
AI Agents

What is an AI agent?medium

Type
conceptual
Topic
ai-agent
Frequency
common
Tags
ai, agent
Answer

An AI agent uses a model to decide actions toward a goal.

Explanation

Agents often combine LLM reasoning with tools, memory, planning, and feedback loops to complete tasks beyond a single response.

Follow-upHow is an agent different from a chatbot?

What is tool calling?medium

Type
conceptual
Topic
tool-calling
Frequency
common
Tags
tool, calling
Answer

Tool calling lets a model request external functions or APIs.

Explanation

Tools can retrieve data, run calculations, search, write files, or trigger workflows. The system decides which tool calls are allowed.

Follow-upHow do you validate tool arguments?

What is agent memory?hard

Type
conceptual
Topic
agent-memory
Frequency
common
Tags
agent, memory
Answer

Memory stores useful state or history for future decisions.

Explanation

Short-term memory may live in context, while long-term memory may use databases or vector stores. Memory must be curated to avoid noise.

Follow-upWhat privacy risks does memory create?

What are guardrails?medium

Type
conceptual
Topic
guardrails
Frequency
common
Tags
guardrails
Answer

Guardrails constrain behavior to keep agents safe and reliable.

Explanation

They include tool permissions, input validation, output checks, human approval, policy filters, and execution limits.

Follow-upWhen should an agent ask for human approval?

How do you evaluate an AI agent?medium

Type
conceptual
Topic
evaluate-ai-agent
Frequency
common
Tags
evaluate, ai, agent
Answer

Measure task success, safety, cost, latency, tool accuracy, and recovery from errors.

Explanation

Agent evaluation often needs multi-step test cases because failures can happen in planning, tool selection, execution, or final response.

Follow-upWhy are single-turn evals not enough for agents?

What is a planning loop in an AI agent?medium

Type
conceptual
Topic
planning-loop
Frequency
common
Tags
planning, agent-loop, tools
Answer

It is a cycle where the agent decides a next step, acts, observes the result, and updates its plan.

Explanation

Planning loops let agents solve multi-step tasks, but they need limits, state tracking, tool validation, and stopping conditions to avoid wasted work or unsafe actions.

Follow-upWhat signals should stop an agent loop?

When should an AI agent ask for human approval?hard

Type
scenario
Topic
human-in-the-loop
Frequency
common
Tags
human-approval, safety, workflow
Answer

It should ask before high-impact, irreversible, expensive, or low-confidence actions.

Explanation

Human approval is useful for payments, deleting data, sending external messages, changing permissions, production deployments, or actions with legal or safety implications.

Follow-upHow do you design approval without making the agent unusable?

How do you design tool permissions for an AI agent?hard

Type
scenario
Topic
tool-permissions
Frequency
common
Tags
tools, permissions, security
Answer

Give the agent the minimum tools and scopes needed for the task.

Explanation

Use allowlists, scoped credentials, argument validation, audit logs, dry-run modes, and separate read-only from write-capable tools to reduce blast radius.

Follow-upWhy is a read-only tool safer than a write-capable tool?

Explain function calling / tool use. How does it differ from RAG?medium

Type
conceptual
Topic
function-calling-tool-use-how-does-it-differ-from-rag
Frequency
common
Tags
ai-agents, explain, function, calling, tool, use
Answer

Tool use: model dynamically decides to call an external function mid-reasoning, gets the result, continues.

Explanation

Tool use: model dynamically decides to call an external function mid-reasoning, gets the result, continues. RAG: retrieval is a preprocessing step — fetch relevant docs, inject into context, then generate. Tool use is dynamic and chainable; RAG is a single retrieval step. In a fund document processing system, agents use tool use to call financial data APIs and Step Functions for orchestration.

Follow-upWhen would you choose one approach over the other?

What is structured output / JSON mode? How do you enforce schema compliance?medium

Type
conceptual
Topic
is-structured-output-json-mode-how-do-you-enforce-schema-c
Frequency
common
Tags
ai-agents, what, structured, output, json, mode
Answer

Structured output forces the model to return valid JSON matching a schema.

Explanation

Structured output forces the model to return valid JSON matching a schema. Approaches: (1) Prompt instruction + few-shot. (2) JSON mode in API (guarantees valid JSON, not schema). (3) Tool use / function calling — model must produce arguments matching the tool's JSON schema. (4) Pydantic parsing with retry loop: if parse fails, send error back with correction instruction. Layer all three in production.

Follow-upCan you give a production example?

How do you measure latency, token cost, and throughput in a multi-agent pipeline?hard

Type
conceptual
Topic
do-you-measure-latency-token-cost-and-throughput-in-a-mult
Frequency
common
Tags
ai-agents, how, you, measure, latency, token
Answer

Instrument with OpenTelemetry (as in an agent platform): trace spans per agent call, LLM invocation, tool call.

Explanation

Instrument with OpenTelemetry (as in an agent platform): trace spans per agent call, LLM invocation, tool call. Metrics: end-to-end latency, per-step latency, token count (input/output), cost per run. CloudWatch for Step Functions execution time. Set token budget per agent, log overruns. Use batching and Bedrock prompt caching to reduce cost on repeated document patterns.

Follow-upCan you give a production example?

What is OpenTelemetry in an agent platform?medium

Type
conceptual
Topic
what-is-opentelemetry-in-an-agent-platform
Frequency
common
Tags
ai-agents, what, opentelemetry, and, how, are
Answer

OpenTelemetry (OTel) is an open-source observability framework for distributed tracing, metrics, and logs.

Explanation

OpenTelemetry (OTel) is an open-source observability framework for distributed tracing, metrics, and logs. In an agent platform: instrument each agent run as a trace with spans for LLM calls, tool executions, and memory operations. Capture attributes: model, token count, latency, tool name, success/failure. Export to a backend (Jaeger, Grafana Tempo) for full visibility into agent execution.

Follow-upCan you give a production example?

How do you build an eval framework for a multi-agent system?hard

Type
scenario
Topic
do-you-build-an-eval-framework-for-a-multi-agent-system
Frequency
common
Tags
ai-agents, how, you, build, eval, framework
Answer

Unit tests per agent with mocked tool responses and deterministic LLM outputs.

Explanation

Unit tests per agent with mocked tool responses and deterministic LLM outputs. Integration tests: full pipeline on golden test cases, compare final output to expected. Per-agent metrics: task completion rate, tool call accuracy, hallucination rate. System-level: end-to-end latency, cost per run, human escalation rate. Use OTel traces to replay failed runs for debugging. Gate deploys on regression test pass rates.

Follow-upCan you give a production example?

What is the ReAct pattern? How does it work in Strands?medium

Type
conceptual
Topic
is-the-react-pattern-how-does-it-work-in-strands
Frequency
common
Tags
ai-agents, what, the, react, pattern, how
Answer

ReAct (Reasoning + Acting) interleaves thought and action: model reasons (Thought), calls a tool (Action), observes the result (Observation), repeats until done.

Explanation

ReAct (Reasoning + Acting) interleaves thought and action: model reasons (Thought), calls a tool (Action), observes the result (Observation), repeats until done. In Strands: the agent loop manages this cycle — model generates a response, if it includes a tool call, Strands executes it and feeds the result back, until the model outputs a final response with no tool call.

Follow-upCan you give a production example?

How do you design multi-agent orchestration for document processing?hard

Type
scenario
Topic
how-do-you-design-multi-agent-orchestration-for-document-p
Frequency
common
Tags
ai-agents, how, did, you, design, the
Answer

Orchestrator agent delegates to: parsing agent (extract raw text from PDF), classification agent (identify document type/section), metadata enrichment agent (extract structured financial fields), and validation agent (sc

Explanation

Orchestrator agent delegates to: parsing agent (extract raw text from PDF), classification agent (identify document type/section), metadata enrichment agent (extract structured financial fields), and validation agent (schema compliance). Step Functions orchestrates the state machine — each agent is a Lambda function. EventBridge triggers pipeline on S3 uploads. Results written to DynamoDB.

Follow-upWhat tradeoffs did you consider in that implementation?

What is HITL in agentic workflows? How did you implement it?medium

Type
scenario
Topic
is-hitl-in-agentic-workflows-how-did-you-implement-it
Frequency
common
Tags
ai-agents, what, hitl, agentic, workflows, how
Answer

HITL pauses the agent at a decision point for human review before proceeding — used for high-stakes actions.

Explanation

HITL pauses the agent at a decision point for human review before proceeding — used for high-stakes actions. In an enterprise AI platform: Step Functions Wait for Callback pattern — agent sends a task token to a review queue (SQS/SNS), human approves via UI, UI calls SendTaskSuccess/Failure with token, agent resumes. In a resume screening system: low-confidence candidates route to HITL before final scoring.

Follow-upWhat tradeoffs did you consider in that implementation?

What agent memory types should a production agent support?medium

Type
conceptual
Topic
what-agent-memory-types-should-a-production-agent-support
Frequency
common
Tags
ai-agents, explain, agent, memory, types, which
Answer

In-context: current session scratch pad. Episodic: records of past interactions.

Explanation

In-context: current session scratch pad. Episodic: records of past interactions. Semantic: long-term factual knowledge. Procedural: learned skills/tools. an agent platform focuses on explicit memory management: in-context (current run state), episodic (stored in DB, retrieved on demand), and tool memory (registered tools with descriptions). Unlike LangChain's implicit memory — everything is explicit and inspectable.

Follow-upCan you give a production example?

Difference between a tool call and a subagent call?medium

Type
conceptual
Topic
between-a-tool-call-and-a-subagent-call
Frequency
common
Tags
ai-agents, difference, between, tool, call, and
Answer

Tool call: agent invokes a deterministic function (API, DB query, calculator) — takes inputs, returns outputs, no reasoning.

Explanation

Tool call: agent invokes a deterministic function (API, DB query, calculator) — takes inputs, returns outputs, no reasoning. Subagent call: agent delegates to another agent with its own LLM, system prompt, memory, and tools. Subagent can reason and make multi-step decisions. Use tools for simple deterministic actions; subagents for complex stateful subtasks that require reasoning.

Follow-upWhen would you choose one approach over the other?

How do you handle agent failures and retries in Step Functions?medium

Type
scenario
Topic
do-you-handle-agent-failures-and-retries-in-step-functions
Frequency
common
Tags
ai-agents, how, you, handle, agent, failures
Answer

Step Functions has built-in retry/catch: configure attempts, backoff rate, and interval per state.

Explanation

Step Functions has built-in retry/catch: configure attempts, backoff rate, and interval per state. Catch specific exceptions (LLM timeout, schema failure), route to error handler. Retry transient failures (API rate limits) with exponential backoff. For logical failures: route to HITL or fallback agent. Dead-letter queue for unrecoverable failures. All transitions logged in CloudWatch.

Follow-upCan you give a production example?

Difference between event-driven and ad-hoc agent execution?hard

Type
conceptual
Topic
difference-between-event-driven-and-ad-hoc-agent-execution
Frequency
common
Tags
ai-agents, difference, between, event, driven, and
Answer

Event-driven: triggered by an external event (S3 upload → EventBridge → Step Functions) — fully automated.

Explanation

Event-driven: triggered by an external event (S3 upload → EventBridge → Step Functions) — fully automated. Used in a fund document processing system (new filing → auto-process). Ad-hoc: triggered on demand by a user or API call (user submits a contract in a document extraction pipeline). Same agent logic, different trigger mechanisms routed through API Gateway or EventBridge rules.

Follow-upWhen would you choose one approach over the other?

How do you prevent infinite loops or runaway tool calls in an autonomous agent?medium

Type
scenario
Topic
do-you-prevent-infinite-loops-or-runaway-tool-calls-in-an
Frequency
common
Tags
ai-agents, how, you, prevent, infinite, loops
Answer

(1) Max iterations / max tool calls limit per run — hard stop.

Explanation

(1) Max iterations / max tool calls limit per run — hard stop. (2) Step budget: track tokens + calls remaining, instruct model to wrap up when low. (3) Loop detection: if the same tool is called with the same args twice, break. (4) Step Functions execution timeout at state machine level. (5) Tool call validator: reject calls not matching expected schema. an agent platform: RunConfig exposes max_steps and max_tokens as explicit constraints.

Follow-upCan you give a production example?

What is Pydantic-based schema enforcement in an agentic pipeline?medium

Type
conceptual
Topic
is-pydantic-based-schema-enforcement-in-an-agentic-pipelin
Frequency
common
Tags
ai-agents, what, pydantic, based, schema, enforcement
Answer

Define expected output structure as a Pydantic model. After each LLM call, parse the response — Pydantic validates types, required fields, and constraints automatically.

Explanation

Define expected output structure as a Pydantic model. After each LLM call, parse the response — Pydantic validates types, required fields, and constraints automatically. On ValidationError: catch it, format it clearly, send back to the model with a correction instruction (self-healing loop). In a document extraction pipeline: caught 100% of structural errors before they hit downstream systems, eliminating silent data corruption.

Follow-upCan you give a production example?

How does LangGraph's state graph differ from LangChain's sequential chain?hard

Type
conceptual
Topic
does-langgraph-s-state-graph-differ-from-langchain-s-seque
Frequency
common
Tags
ai-agents, how, does, langgraph, state, graph
Answer

LangChain chains are linear: A → B → C. Can't loop or branch.

Explanation

LangChain chains are linear: A → B → C. Can't loop or branch. LangGraph models workflows as a directed graph with explicit state: nodes are functions/agents, edges define transitions (conditional or fixed). Supports cycles (loop back to previous steps), branching (route based on state), and state persistence for long-running tasks. Used in an enterprise AI platform for complex multi-step workflows with HITL branching.

Follow-upWhen would you choose one approach over the other?

How do you manage shared state between agents in a multi-agent workflow?hard

Type
conceptual
Topic
do-you-manage-shared-state-between-agents-in-a-multi-agent
Frequency
common
Tags
ai-agents, how, you, manage, shared, state
Answer

(1) Pass state explicitly — orchestrator collects outputs and injects relevant parts into the next agent's prompt.

Explanation

(1) Pass state explicitly — orchestrator collects outputs and injects relevant parts into the next agent's prompt. (2) Shared store — agents read/write to a central state object (LangGraph StateGraph, DynamoDB, or in-memory dict). (3) Message bus — agents publish events, others subscribe. In a fund document processing system: Step Functions passes execution state between Lambda agents; DynamoDB stores intermediate results.

Follow-upCan you give a production example?

What is the role of EventBridge in an agentic platform?medium

Type
conceptual
Topic
what-is-the-role-of-eventbridge-in-an-agentic-platform
Frequency
common
Tags
ai-agents, what, the, role, eventbridge, your
Answer

EventBridge is a serverless event bus. S3 file uploads emit events → EventBridge rule matches on object type → triggers Step Functions or Lambda.

Explanation

EventBridge is a serverless event bus. S3 file uploads emit events → EventBridge rule matches on object type → triggers Step Functions or Lambda. Decouples producers (data sources) from consumers (agents). Supports event filtering, scheduling (nightly batch jobs), and cross-account routing. Adding a new agent doesn't require changing the data source — just add a new EventBridge rule.

Follow-upCan you give a production example?

How do you do audit logging for LLM agent decisions in a regulated environment?hard

Type
conceptual
Topic
do-you-do-audit-logging-for-llm-agent-decisions-in-a-regul
Frequency
common
Tags
ai-agents, how, you, audit, logging, for
Answer

Every LLM call logs: input (system prompt + messages), output, model ID, timestamp, token count, latency, run ID.

Explanation

Every LLM call logs: input (system prompt + messages), output, model ID, timestamp, token count, latency, run ID. Stored immutably in S3 with object lock. Structured as JSON for queryability via Athena. Agent-level: log each tool call (name, args, result) and reasoning steps. Correlation ID traces a request across all agents. Also log which human approved any HITL decision.

Follow-upCan you give a production example?

What is MCP and how can agents use it for tool integrations?hard

Type
conceptual
Topic
what-is-mcp-and-how-can-agents-use-it-for-tool-integration
Frequency
common
Tags
ai-agents, what, mcp, model, context, protocol
Answer

MCP is an open protocol by Anthropic standardizing how LLM applications connect to external tools and data sources.

Explanation

MCP is an open protocol by Anthropic standardizing how LLM applications connect to external tools and data sources. Defines a client-server model: the LLM client discovers and calls tools exposed by an MCP server with consistent schemas. an agent platform uses MCP for external messaging and tool integrations — agents connect to MCP servers (Telegram, Slack, APIs) without custom integration code per tool.

Follow-upCan you give a production example?

How do you configure and score candidate evaluations?medium

Type
scenario
Topic
how-do-you-configure-and-score-candidate-evaluations
Frequency
common
Tags
ai-agents, how, you, configure, and, score
Answer

Job templates define required skills, experience levels, and custom screening questions with configurable weights (e.g., Python: 30%, LLM experience: 40%, communication: 30%).

Explanation

Job templates define required skills, experience levels, and custom screening questions with configurable weights (e.g., Python: 30%, LLM experience: 40%, communication: 30%). The agentic pipeline extracts structured candidate data, scores each dimension using an LLM evaluator against the rubric, computes weighted total. Configurable thresholds route candidates to auto-pass, HITL review, or auto-reject. Integrates with Workday for status updates.

Follow-upCan you give a production example?

What is the difference between system prompt and runtime instruction?medium

Type
conceptual
Topic
is-the-difference-between-system-prompt-vs-runtime-instruc
Frequency
common
Tags
ai-agents, what, the, difference, between, system
Answer

System prompt: static, set at agent initialization — defines persona, capabilities, constraints, output format.

Explanation

System prompt: static, set at agent initialization — defines persona, capabilities, constraints, output format. Doesn't change per run. Runtime instruction: dynamic, passed per invocation — the specific task for this run. Separating them allows: (1) Reuse the same agent for multiple tasks. (2) Cache the system prompt token cost. (3) Cleaner API — callers only pass the task, not re-specify the agent's full context.

Follow-upWhen would you choose one approach over the other?

How do you handle token budget management across a long multi-agent conversation?hard

Type
scenario
Topic
do-you-handle-token-budget-management-across-a-long-multi
Frequency
common
Tags
ai-agents, how, you, handle, token, budget
Answer

Track token usage cumulatively. When approaching limit: (1) Summarize older turns, replace with summary (rolling context compression).

Explanation

Track token usage cumulatively. When approaching limit: (1) Summarize older turns, replace with summary (rolling context compression). (2) Evict least-relevant messages by importance scoring. (3) Move completed context to external memory (DynamoDB), fetch back when needed. (4) Use Bedrock prompt caching to avoid re-processing stable system prompt on every turn. Hard limits per agent via RunConfig.

Follow-upCan you give a production example?

What is the difference between a planner, executor, and critic agent?medium

Type
conceptual
Topic
is-the-difference-between-a-planner-executor-and-critic-ag
Frequency
common
Tags
ai-agents, what, the, difference, between, planner
Answer

Planner: decomposes a high-level goal into subtasks, creates an execution plan.

Explanation

Planner: decomposes a high-level goal into subtasks, creates an execution plan. Doesn't execute. Executor: carries out individual subtasks — calls tools, writes output. No high-level planning. Critic: reviews executor output against the goal — identifies errors or missing steps, feeds back to planner or executor for correction. This pattern is used in AutoGen and similar frameworks for higher-quality autonomous task completion.

Follow-upWhen would you choose one approach over the other?

How do you implement explicit memory management differently from LangChain?medium

Type
scenario
Topic
how-do-you-implement-explicit-memory-management-differentl
Frequency
common
Tags
ai-agents, nxagent, how, you, implement, explicit
Answer

LangChain's memory is implicit — ConversationBufferMemory automatically appends everything.

Explanation

LangChain's memory is implicit — ConversationBufferMemory automatically appends everything. an agent platform makes memory explicit: you define what gets stored (agent_result.memory_write), what gets retrieved (memory.fetch(query)), and when memory is cleared. Memory is typed (episodic vs semantic), stored in a pluggable backend (in-memory for tests, DynamoDB for production), and retrieved via semantic search.

Follow-upWhen would you choose one approach over the other?

How do you test a multi-agent pipeline end-to-end?hard

Type
conceptual
Topic
do-you-test-a-multi-agent-pipeline-end-to-end
Frequency
common
Tags
ai-agents, how, you, test, multi, agent
Answer

(1) Unit tests: each agent in isolation with mocked tool responses and deterministic LLM outputs.

Explanation

(1) Unit tests: each agent in isolation with mocked tool responses and deterministic LLM outputs. (2) Integration tests: full pipeline with real LLM calls on golden dataset. (3) Contract tests: verify each agent's input/output schema stays stable (Pydantic). (4) Load tests: N parallel executions, verify Step Functions handles concurrency. (5) Chaos tests: inject failures (tool timeout, LLM error), verify retry/fallback logic. Gate deploys on all passing.

Follow-upCan you give a production example?