InterviewCompany

OpenAI Interview Questions

LLM systems, evaluation, safety, RAG, and product-quality AI engineering questions.

4 questions
OpenAI

How would you evaluate a customer-support LLM assistant?hard

Type
scenario
Topic
how-would-you-evaluate-a-customer-support-llm-assistant
Frequency
common
Answer

Use task success, factuality, refusal quality, latency, cost, and human review.

Explanation

LLM evaluation should combine golden datasets, rubric-based judging, production feedback, safety checks, and regression tests for known failure modes.

Follow-upHow do you detect hallucinations?

How would you design a RAG system for internal docs?hard

Type
scenario
Topic
how-would-you-design-a-rag-system-for-internal-docs
Frequency
common
Answer

Ingest docs, chunk, embed, retrieve, rerank, generate with citations, and evaluate.

Explanation

Mention permissions, freshness, chunk strategy, hybrid search, reranking, context packing, answer grounding, and feedback loops.

Follow-upHow do you handle stale documents?

What is the difference between fine-tuning and prompting?medium

Type
scenario
Topic
what-is-the-difference-between-fine-tuning-and-prompting
Frequency
common
Answer

Prompting changes instructions at runtime; fine-tuning changes model behavior through training examples.

Explanation

Prompting is faster and flexible. Fine-tuning helps style, format consistency, and repeated task behavior, but needs data quality and evaluation.

Follow-upWhen would you avoid fine-tuning?

How do you reduce latency in an LLM product?medium

Type
scenario
Topic
how-do-you-reduce-latency-in-an-llm-product
Frequency
common
Answer

Optimize model choice, prompt length, retrieval, streaming, caching, and parallel work.

Explanation

Latency work includes measuring time to first token, token generation rate, retrieval overhead, network cost, and fallback paths.

Follow-upWhat tradeoff exists between quality and latency?
Back to Interview