InterviewSkill

System Design AI Interview Questions

Architecture tradeoffs for production AI, ML, LLM, RAG, and agent systems.

8 questions
System Design AI

How do you design an AI question-answering system?medium

Type
scenario
Topic
design-ai-question-answering-system
Frequency
common
Tags
design, ai, question, answering, system
Answer

Combine ingestion, retrieval, generation, evaluation, and monitoring.

Explanation

A typical design includes document processing, embeddings, a vector store, reranking, prompt assembly, an LLM, citations, caching, and feedback loops.

Follow-upWhere would you add observability?

How do you reduce AI system latency?medium

Type
conceptual
Topic
reduce-ai-system-latency
Frequency
common
Tags
reduce, ai, system, latency
Answer

Optimize retrieval, prompt size, model choice, caching, and streaming.

Explanation

Latency comes from network calls, retrieval, reranking, model generation, and tool use. Smaller models and cached context can help.

Follow-upWhat tradeoff does a smaller model introduce?

When do you need a vector database?hard

Type
conceptual
Topic
need-vector-database
Frequency
common
Tags
need, vector, database
Answer

Use one when semantic search over many embeddings must be fast and scalable.

Explanation

Vector databases support approximate nearest neighbor search, metadata filtering, indexing, and operational management for retrieval systems.

Follow-upHow do metadata filters affect retrieval?

How do you control AI system cost?medium

Type
conceptual
Topic
control-ai-system-cost
Frequency
common
Tags
control, ai, system, cost
Answer

Track token usage, model choice, caching, batching, and routing.

Explanation

Cost-aware systems use cheaper models for simple tasks, reserve stronger models for hard cases, cache repeated outputs, and limit unnecessary context.

Follow-upWhat is model routing?

How do you evaluate a production AI system?medium

Type
conceptual
Topic
evaluate-production-ai-system
Frequency
common
Tags
evaluate, production, ai, system
Answer

Use offline tests, online monitoring, user feedback, and safety checks.

Explanation

Evaluation should measure answer quality, grounding, retrieval quality, latency, cost, safety, and task completion across realistic examples.

Follow-upWhat should be in a golden dataset?

How would you design a multi-tenant RAG system?hard

Type
scenario
Topic
multi-tenant-rag
Frequency
common
Tags
rag, multi-tenant, security
Answer

Isolate tenant data in ingestion, storage, retrieval filters, permissions, logging, and evaluation.

Explanation

The main risks are cross-tenant data leakage and permission mistakes. Use tenant-scoped indexes or strict metadata filters, access checks, audit logs, and regression tests.

Follow-upHow would you test for cross-tenant retrieval leaks?

How do you handle rate limits in an AI system?medium

Type
scenario
Topic
ai-rate-limits
Frequency
common
Tags
rate-limits, reliability, fallbacks
Answer

Use queues, backoff, caching, request shaping, fallback models, and clear user-facing degradation.

Explanation

AI systems often depend on external model APIs with quota and latency limits. Good design avoids cascading failures and keeps critical paths responsive.

Follow-upWhat requests are good candidates for caching?

Why do production AI systems need prompt versioning?medium

Type
scenario
Topic
prompt-versioning
Frequency
common
Tags
prompting, versioning, evals
Answer

Prompt changes can alter behavior, so versions make releases traceable, testable, and reversible.

Explanation

Store prompt templates, model settings, retrieval settings, and evaluation results together. Rollouts should compare quality, safety, latency, and cost.

Follow-upWhat would you log for each prompt version?