InterviewRole

Generative AI Engineer Interview Questions

A path for LLMs, prompts, RAG, agents, evaluations, safety, and production AI systems.

Questions

LLMsRAGAI AgentsEvaluationMLOps

What is an LLM?medium

Answer

A model trained to predict and generate text-like tokens.

Explanation

LLMs learn patterns from large corpora and can be adapted through prompting, retrieval, fine-tuning, and tool use.

Follow-upWhy can LLMs hallucinate?

How do you improve LLM output quality?medium

Answer

Improve instructions, context, examples, retrieval, evaluation, and constraints.

Explanation

Quality work is iterative: define failure cases, create test sets, measure outputs, and reduce ambiguity.

Follow-upWhen does fine-tuning help?

What is context window management?medium

Answer

Choosing what information fits into the model input.

Explanation

Good systems prioritize relevant context, compress history, remove noise, and preserve source-grounded facts.

Follow-upHow would you handle long conversations?

When should you use RAG?medium

Answer

Use it when answers need private, current, or source-grounded information.

Explanation

RAG separates knowledge retrieval from generation, but quality depends on chunking, retrieval, ranking, and citations.

Follow-upWhen is fine-tuning a better fit?

How do you debug a bad RAG answer?medium

Answer

Check retrieval first, then ranking, prompt context, and generation behavior.

Explanation

If the right document is not retrieved, fix indexing or search. If it is retrieved but ignored, fix prompting or context layout.

Follow-upHow would you measure retrieval quality?

What makes a good chunk?medium

Answer

It is self-contained, focused, and sized for retrieval and context limits.

Explanation

Chunking should preserve meaning, headings, metadata, and enough surrounding context to answer accurately.

Follow-upHow can overlap help?

When should you use an agent?medium

Answer

Use an agent when the task needs planning, tools, or multi-step decisions.

Explanation

Agents add power but also cost, latency, and reliability risk. Simple workflows should stay deterministic.

Follow-upWhen is a fixed workflow better?

How do you make agents reliable?medium

Answer

Constrain tools, validate outputs, add state checks, and evaluate task completion.

Explanation

Reliability improves with small action spaces, clear tool contracts, retries, human escalation, and logs.

Follow-upHow do you prevent tool misuse?

What should agent memory store?medium

Answer

Only useful, consented, and durable context.

Explanation

Memory needs privacy boundaries, update rules, deletion paths, and safeguards against stale or sensitive information.

Follow-upWhat should never be stored?

How do you evaluate an AI product?medium

Answer

Create task rubrics, golden sets, user metrics, and safety checks.

Explanation

Good evaluation combines automated scoring, human judgment, regression tests, and production telemetry.

Follow-upHow do you handle subjective quality?

What is an eval dataset?medium

Answer

A representative set of inputs and expected behaviors.

Explanation

It should include common cases, edge cases, adversarial cases, and historical failures so quality can be tracked.

Follow-upHow often should evals run?

How do you compare two prompts?medium

Answer

Run both on the same eval set and review quality, cost, and latency.

Explanation

Prompt changes can improve one slice and hurt another, so compare across categories and failure modes.

Follow-upWhat if humans disagree?

What should an ML pipeline version?medium

Answer

Data, code, features, model artifacts, parameters, and evaluation results.

Explanation

Versioning lets teams reproduce a model, compare experiments, rollback safely, and audit production behavior.

Follow-upHow would you version training data?

How do you monitor a model?medium

Answer

Track service health, data quality, drift, prediction quality, and business impact.

Explanation

Production ML needs both software metrics and model-specific signals, especially when labels arrive late.

Follow-upWhat do you monitor before labels arrive?

What is training-serving skew?medium

Answer

A mismatch between training features and production features.

Explanation

Skew often comes from duplicated feature logic, different timestamps, missing values, or online/offline transformation drift.

Follow-upHow does a feature store help?

Back to Home