Python

The everyday language for data work, backend APIs, automation, and AI — built on a small set of ideas worth understanding deeply.

mutability scope & closures decorators generators collections OOP __slots__ async/await context managers type hints
Topic overview
Values and references, scope and closures, decorators, generators, the right collection type, Python's object protocol, memory optimisation with __slots__, the async event loop model, context managers for resource safety, modern type annotations, and graceful error handling.
Core concepts
Mutable vs immutable, LEGB, closures, functools.wraps, comprehensions, yield, Counter/defaultdict/deque, dunder methods, __slots__, GIL, coroutines and tasks, async context managers, __enter__/__exit__, Optional/Protocol, and try/except/finally.
Why it matters
Strong Python fundamentals let you move quickly without introducing subtle bugs — reference aliasing, mutable defaults, late-binding closures, blocking event loops, and GIL misconceptions trip up engineers who learned Python by copying examples rather than understanding the model.
Interview relevance
Interviews test whether you can reason through tricky reference and scope questions, write clean transformations, know the right collection type, explain the async event loop, and design Pythonic resource-safe APIs. The "why" behind each feature answers follow-up questions naturally.

Types & mutability

The most common source of subtle bugs in Python code

Why — motivation

Most Python bugs in production come from unexpected mutation — a function silently modifies a list that was passed in, a mutable default argument accumulates state across calls, or two variables thought to be independent are actually pointing at the same object. Understanding how Python handles values and references prevents an entire category of bugs.

Interviewers use mutability questions as a proxy for whether you've truly internalised the language rather than just used it — they're quick to ask and reveal a lot.

Intuition — the mental model

Python variables are labels, not boxes. When you write x = [1, 2, 3], you're sticking a label x on a list object. If you then write y = x, you've put a second label on the same object. Changing the object through either label affects both.

Immutable objects (int, str, tuple) can't change in place — Python creates a new object instead. Mutable objects (list, dict, set) change in place, so all labels pointing at them see the change.

Explanation
Mutable vs immutable types
Immutable
int, float, bool, str, tuple, frozenset, bytes. Cannot be changed after creation. Safe to use as dict keys. Reassignment creates a new object — the caller's reference is unaffected.
Mutable
list, dict, set, bytearray. Modified in place. Unhashable — cannot be dict keys. Passing a mutable to a function lets the function silently mutate the caller's data.
The mutable default argument trap
# BUG — the default list is created ONCE at function definition def append_to(val, target=[]): target.append(val) return target append_to(1) # [1] append_to(2) # [1, 2] ← NOT what you expected # FIX — use None as sentinel def append_to(val, target=None): if target is None: target = [] target.append(val) return target

Default arguments are evaluated once when the function is defined. A mutable default persists across every call. Use None as a sentinel and create the mutable inside the function body.

is vs == and shallow vs deep copy
a = [1, 2, 3] b = a # same object — (b is a) == True import copy c = a.copy() # shallow copy — new list, same inner objects d = copy.deepcopy(a) # deep copy — fully independent # Shallow copy trap with nested structures: nested = [[1, 2], [3, 4]] shallow = nested.copy() shallow[0].append(99) print(nested) # [[1, 2, 99], [3, 4]] ← original mutated!

is checks identity (same object in memory). == checks value equality. For nested structures, a shallow copy only copies the outer container — inner objects are still shared references.

Interview Q & A
Q: What's the output of this code, and why?

def f(x, lst=[]):
    lst.append(x)
    return lst

print(f(1)) / print(f(2)) / print(f(3))
A: Output is [1], then [1,2], then [1,2,3]. The default list is created once at function definition time and persists across calls. Fix: use lst=None and initialise inside the function. This reveals whether a candidate understands that default values are part of the function object, not re-evaluated per call.

Functions & scope

How Python resolves names and how functions capture their environment

Why — motivation

Functions are first-class citizens in Python — they can be passed as arguments, returned from other functions, and stored in variables. This makes closures and higher-order functions natural, but also introduces scope bugs that are hard to debug if you don't know the rules.

The late-binding closure trap is a classic interview question precisely because it looks wrong to anyone who doesn't understand how Python looks up free variables at call time rather than definition time.

Intuition — the mental model

When Python sees a variable name, it searches four scopes in order: Local → Enclosing → Global → Built-in (LEGB). It finds the first match and stops. This lookup happens at runtime, not when the function is defined.

A closure remembers the enclosing scope it was created in — but it remembers the variable (the label), not the value at creation time. If the enclosing variable later changes, the closure sees the new value.

Explanation
LEGB scope rule
x = 'global' def outer(): x = 'enclosing' def inner(): x = 'local' print(x) # local — found in L inner() print(x) # enclosing — found in E outer() print(x) # global — found in G # nonlocal reaches into the enclosing scope to mutate it def counter(): count = 0 def increment(): nonlocal count count += 1 return count return increment
*args and **kwargs
def log(level, *args, **kwargs): # args → tuple of extra positional arguments # kwargs → dict of extra keyword arguments print(f"[{level}]", *args, **kwargs) log("INFO", "starting", "pipeline", sep=" | ") # Unpacking at call site vals = [1, 2, 3] opts = {"z": 3} process(*vals[:2], **opts) # process(1, 2, z=3)

*args collects extra positional arguments as a tuple; **kwargs collects keyword arguments as a dict. Use *iterable and **mapping at the call site to unpack them back out.

The late-binding closure trap
# BUG — all closures reference the same variable i fns = [lambda: i for i in range(3)] print([f() for f in fns]) # [2, 2, 2] — NOT [0, 1, 2] # At call time, i == 2. All lambdas share the same binding. # FIX — capture the current value as a default argument fns = [lambda i=i: i for i in range(3)] print([f() for f in fns]) # [0, 1, 2]

Closures capture the variable, not its value. The default-argument trick captures the current value at definition time because default arguments are evaluated immediately.

Interview Q & A
Q: What does this print, and how would you fix it?

fns = []
for i in range(3):
    fns.append(lambda: i)
print([f() for f in fns])
A: Prints [2, 2, 2]. Each lambda closes over the variable i. When called, Python looks up i in the enclosing scope — by then the loop is done and i equals 2. Fix: lambda i=i: i captures the current value at lambda creation time via the default argument mechanism.

Decorators

Functions that wrap functions — the pattern behind routes, caching, and retries

Why — motivation

Decorators appear everywhere in real Python: Flask/FastAPI routes (@app.get), pytest fixtures (@pytest.fixture), caching (@lru_cache), retry logic, access control, and timing. You need to know how to write one, not just recognise the @ symbol.

Understanding decorators also unlocks a mental model: functions are objects, and returning a function from a function is completely normal. This is the same thinking needed for closures and higher-order design patterns.

Intuition — the mental model

A decorator is a function that takes a function and returns a (usually enhanced) function. The @decorator syntax is pure sugar — @log_calls above def process(...) is identical to writing process = log_calls(process) immediately after the function definition.

The inner wrapper function is what gets called instead of the original. It can run code before and after the original, modify arguments or return values, or skip the original entirely.

Explanation
Writing a decorator
import functools def log_calls(func): @functools.wraps(func) # preserves __name__, __doc__, signature def wrapper(*args, **kwargs): print(f"→ calling {func.__name__}") result = func(*args, **kwargs) print(f"← done {func.__name__}") return result return wrapper @log_calls def process(data): return data.upper() # Equivalent to: process = log_calls(process) # process.__name__ → "process" (not "wrapper" — thanks to @wraps)

Always use @functools.wraps(func) on the wrapper. Without it, the decorated function loses its __name__, __doc__, and signature — breaking introspection and debugging tools.

Decorators with arguments
def retry(times=3, exceptions=(Exception,)): def decorator(func): @functools.wraps(func) def wrapper(*args, **kwargs): for attempt in range(times): try: return func(*args, **kwargs) except exceptions: if attempt == times - 1: raise return wrapper return decorator @retry(times=5, exceptions=(TimeoutError, ConnectionError)) def call_api(url): ...

A decorator with arguments needs one extra nesting level. Three layers: outer function accepts config → returns the decorator → decorator wraps the function.

@lru_cache and stacking
from functools import cache @cache # Python 3.9+ — unbounded memoisation def fib(n): if n < 2: return n return fib(n-1) + fib(n-2) # fib(40) computed once, then O(1) from cache # Stacking — applied bottom-up @A @B @C def f(): ... # Equivalent to: f = A(B(C(f))) # C wraps first, B wraps that, A wraps that

@cache memoizes pure functions — identical arguments return the cached result. Requires hashable arguments. When stacking, the decorator closest to the function applies first.

Interview Q & A
Q: Write a decorator that measures and prints the execution time of any function it wraps.
A: import time, functools — def timeit(func): — @functools.wraps(func) — def wrapper(*args, **kwargs): — start = time.perf_counter() — result = func(*args, **kwargs) — print(f"{func.__name__}: {time.perf_counter()-start:.4f}s") — return result — return wrapper. Key points: use perf_counter for precision, always return the original result, use @functools.wraps to preserve the function's identity, and accept *args/**kwargs so it works on any function signature.

Comprehensions & generators

Idiomatic iteration — from concise syntax to memory-efficient pipelines

Why — motivation

Comprehensions are the most visible marker of Python fluency — any code review in a Python shop will flag a for loop building a list when a comprehension would do. Generators go further: they're the idiomatic way to process large datasets without loading everything into memory, which matters enormously in production data pipelines.

Understanding the memory difference between a list comprehension and a generator expression is a proxy for whether you think about memory at all — a signal interviewers actively look for in data-heavy roles.

Intuition — the mental model

A list comprehension builds the entire result upfront and holds it in memory. A generator produces one item at a time, pausing between — it uses O(1) memory regardless of the number of items. The syntax is nearly identical: square brackets vs parentheses.

Think of a generator as a recipe, not a meal. The recipe doesn't cook all the food at once — it gives instructions you follow one step at a time. Only when you ask for the next item does it compute it.

Explanation
Comprehension forms
# List comprehension — builds full list in memory squares = [x**2 for x in range(10) if x % 2 == 0] # Dict comprehension word_len = {w: len(w) for w in ["cat", "elephant"]} # Set comprehension — deduplicates unique_lengths = {len(w) for w in ["cat", "dog", "elephant"]} # Nested — flatten a 2D list (outer loop first) matrix = [[1, 2], [3, 4], [5, 6]] flat = [x for row in matrix for x in row] # [1, 2, 3, 4, 5, 6]
Generator expressions & yield
# Generator expression — parentheses, lazy gen = (x**2 for x in range(10_000_000)) # zero memory used yet next(gen) # 0 next(gen) # 1 # Generator function — yield pauses and resumes def read_chunks(filepath, size=1024): with open(filepath, "rb") as f: while chunk := f.read(size): yield chunk # pauses here, resumes on next() # Memory comparison import sys lst = [x for x in range(1_000_000)] gen = (x for x in range(1_000_000)) sys.getsizeof(lst) # ~8 MB sys.getsizeof(gen) # 128 bytes

yield turns a function into a generator. Each next() call runs up to the next yield, pauses, and returns the yielded value. Local state is preserved between yields.

When to use which
List comprehension
You need to index into the result, iterate multiple times, check length, or pass to something requiring a list. Data fits comfortably in memory.
Generator
Data is large or infinite. Single pass is enough. Feeding a pipeline — sum, max, any, all accept generators natively. Reading files, streaming data from APIs.
Interview Q & A
Q: What's the difference between [x*2 for x in data] and (x*2 for x in data)? When would you choose each?
A: The list comprehension evaluates immediately and stores all results in memory — O(n) space. The generator expression is lazy — it produces values one at a time using O(1) space. Choose a list when you need to iterate multiple times, index into the result, or when data is small. Choose a generator when processing large data in a single pass — sum(x*2 for x in data) computes the sum without materialising the list. For large files or data pipelines, generators are always preferable.

Built-ins & collections

The right tool for common patterns — knowing these separates fluent Python from homework Python

Why — motivation

Knowing Counter vs building a frequency dict by hand, defaultdict vs a guarded dict.get, deque vs a list used as a queue — these choices show whether you know the language's stdlib. They also matter for performance: list.insert(0, x) is O(n); deque.appendleft(x) is O(1).

The built-in functions (enumerate, zip, any, all) remove boilerplate loops and make intent immediate. Interviewers notice when candidates reach for these vs writing manual counters and index tracking.

Intuition — the mental model

Python's collections module gives you dict-like structures optimised for specific access patterns. Think of them as specialised dicts: Counter is a dict that counts, defaultdict is a dict that never raises KeyError, deque is a list optimised for both ends.

Explanation
Essential built-in functions
# enumerate — index + value, no manual counter for i, item in enumerate(["a", "b", "c"], start=1): print(i, item) # zip — parallel iteration, stops at shortest for name, score in zip(names, scores): ... # sorted with key ranked = sorted(players, key=lambda p: p.score, reverse=True) # any / all — short-circuit, accept generators valid = all(x > 0 for x in values) # stops at first False found = any(pred(x) for x in data) # stops at first True
Counter, defaultdict, deque, namedtuple
from collections import Counter, defaultdict, deque, namedtuple # Counter — frequency map, missing keys return 0 freq = Counter(["apple", "banana", "apple"]) freq.most_common(2) # [("apple", 2), ("banana", 1)] freq["missing"] # 0 — no KeyError # defaultdict — callable produces default for missing keys graph = defaultdict(list) graph["A"].append("B") # no KeyError on first access word_count = defaultdict(int) for word in text.split(): word_count[word] += 1 # deque — O(1) at both ends (list.insert(0, x) is O(n)) q = deque(maxlen=100) # auto-evicts oldest on overflow q.appendleft(x) # O(1) prepend q.popleft() # O(1) pop from front # namedtuple — immutable record with named fields Point = namedtuple("Point", ["x", "y"]) p = Point(3, 4) p.x, p.y # named access — clearer than p[0], p[1]
Interview Q & A
Q: Given a list of words, return the top 3 most frequent words and their counts.
A: Counter(words).most_common(3) — one line. Counter builds the frequency map directly from the iterable, and most_common(n) returns the n highest-count pairs using a heap, so it's O(n log k) not O(n log n). The manual approach — build a dict, sort by value, slice — works but is three times more code for no benefit. Knowing Counter is the expected answer for any frequency/ranking question in Python.

OOP & dunder methods

Python's object model — how syntax maps to method calls

Why — motivation

Every Python operation is a method call in disguise. len(x) calls x.__len__(). a + b calls a.__add__(b). for item in x calls x.__iter__(). Understanding this unlocks the ability to write Pythonic APIs — classes that slot naturally into built-in syntax like len, in, with, and iteration.

OOP design questions appear in system design rounds. The composition-vs-inheritance distinction matters when designing extensible pipelines — a common topic in ML engineering interviews.

Intuition — the mental model

Python's object model is protocol-based. To make your class "iterable," implement __iter__ and __next__. To make it "comparable," implement __eq__. Python checks for these methods and uses them — it doesn't care about the class hierarchy. This is structural typing, not nominal typing.

Prefer composition over inheritance: give a class a reference to another object rather than inheriting from it. Inheritance creates tight coupling ("is-a"); composition creates flexibility ("has-a") and makes testing easier.

Explanation
Essential dunder methods
class Vector: def __init__(self, x, y): self.x, self.y = x, y def __repr__(self): # unambiguous — for devs, ideally eval()-safe return f"Vector({self.x}, {self.y})" def __str__(self): # human-readable — for print() return f"({self.x}, {self.y})" def __add__(self, other): # enables v1 + v2 return Vector(self.x + other.x, self.y + other.y) def __len__(self): # enables len(v) return 2 def __eq__(self, other): # enables v1 == v2 return self.x == other.x and self.y == other.y def __hash__(self): # required alongside __eq__ return hash((self.x, self.y))

If you define __eq__ without __hash__, Python sets __hash__ = None — instances become unhashable. Always define both together.

@property, @classmethod, @staticmethod
class Circle: def __init__(self, radius): self._radius = radius @property def radius(self): # accessed as c.radius — no () return self._radius @radius.setter def radius(self, value): # c.radius = 5 calls this if value < 0: raise ValueError("radius must be non-negative") self._radius = value @classmethod def from_diameter(cls, d): # alternative constructor return cls(d / 2) @staticmethod def unit_area(): # no access to instance or class return 3.14159
Composition vs inheritance
Inheritance — "is-a"
Use when a subclass truly specialises the parent. Avoid deep chains — fragile coupling. Python's MRO (C3 linearisation) handles multiple inheritance but becomes complex quickly.
Composition — "has-a"
Give a class a reference to another object. More flexible — swap implementations at runtime. A Pipeline holding a list of Transformer objects is far easier to extend than a deep hierarchy.
Interview Q & A
Q: You define __eq__ on a class. A colleague tries to use instances as dict keys and gets a TypeError. Why, and how do you fix it?
A: When you define __eq__, Python automatically sets __hash__ = None, making instances unhashable. This is intentional — if two objects are equal, they must have the same hash. Without a custom __hash__, that invariant can't be guaranteed. Fix: define __hash__ using the same fields that determine equality, e.g. return hash((self.x, self.y)). If the object is mutable, it generally shouldn't be hashable at all — mutating it would break dictionary lookup.

__slots__

Trading flexibility for memory — when instances number in the millions

Why — motivation

By default, every Python instance carries a __dict__ — a full hash map just for that object's attributes. For small programs this is fine. For classes instantiated millions of times (graph nodes, feature vectors, event records in a stream processor), the per-instance dict overhead is significant: typically 200–400 bytes per object that you're paying for nothing.

__slots__ is one of the few Python optimisations worth knowing by name. It comes up in interviews about memory-efficient data structures and high-throughput systems.

Intuition — the mental model

__slots__ tells Python: "this class will only ever have these exact attributes — no dynamic addition allowed." Python replaces the per-instance __dict__ with a fixed array of slots — like a C struct. No hash map, no rehashing, no wasted capacity.

The tradeoff: you gain memory and access speed, but lose the ability to add arbitrary attributes at runtime. It's a deliberate choice to lock down the class contract in exchange for efficiency.

Explanation
Defining and using __slots__
class Point: __slots__ = ("x", "y") # tuple or list of attribute names def __init__(self, x, y): self.x = x self.y = y p = Point(3, 4) p.x # 3 — normal access p.z = 5 # AttributeError — no __dict__, can't add new attributes # Memory comparison import sys class WithDict: def __init__(self, x, y): self.x = x; self.y = y class WithSlots: __slots__ = ("x", "y") def __init__(self, x, y): self.x = x; self.y = y d = WithDict(1, 2) s = WithSlots(1, 2) sys.getsizeof(d) + sys.getsizeof(d.__dict__) # ~280 bytes sys.getsizeof(s) # ~56 bytes

The savings compound at scale: 1 million instances of WithSlots save roughly 200 MB vs WithDict. Attribute access is also slightly faster since Python uses fixed offsets rather than a hash lookup.

Slots with inheritance
class Base: __slots__ = ("x",) class Child(Base): __slots__ = ("y",) # only NEW slots — don't repeat "x" # Child instances have x (from Base) and y (from Child) # Both slot arrays are used — no __dict__ on either class ChildWithDict(Base): pass # no __slots__ — this child has __dict__ again! # The benefit is lost if ANY class in the MRO lacks __slots__

For __slots__ to work through an inheritance chain, every class in the hierarchy must define it. One missing __slots__ anywhere re-introduces a __dict__ on all instances.

Gotchas and limits
  • Cannot add attributes not listed in __slots__ — breaks code that uses instance.__dict__ directly.
  • Mixins and multiple inheritance get complicated when slots collide — prefer composition in those cases.
  • Weak references require "__weakref__" in the slots tuple explicitly.
  • dataclasses work with __slots__ via @dataclass(slots=True) (Python 3.10+) — the cleanest modern approach.
from dataclasses import dataclass @dataclass(slots=True) # Python 3.10+ class Coordinate: x: float y: float z: float = 0.0 # Gets __init__, __repr__, __eq__ AND __slots__ for free
Interview Q & A
Q: When would you use __slots__, and what do you give up by using it?
A: Use __slots__ when you have a class that will be instantiated in very large numbers — millions of objects — and you know the set of attributes upfront. It replaces the per-instance __dict__ with a compact fixed array, saving 150–300 bytes per object. For a million instances, that's 150–300 MB saved. The tradeoffs: you cannot add arbitrary attributes at runtime, it complicates multiple inheritance (every class in the MRO must define slots), and you need to explicitly include "__weakref__" if you need weak references. For data-heavy roles, I'd also mention @dataclass(slots=True) in Python 3.10+ as the cleanest way to get this benefit alongside auto-generated __init__, __repr__, and __eq__.

Concurrency model

The GIL, threading, asyncio, and multiprocessing — picking the right tool

Why — motivation

The GIL explains why multithreading doesn't speed up CPU-bound Python work — a fact that surprises many developers. Picking the wrong concurrency model can make performance worse, not better, and this comes up in system design rounds when discussing parallel data fetching, serving ML models under load, or writing efficient ETL pipelines.

Intuition — the mental model

The GIL (Global Interpreter Lock) is a mutex inside CPython that allows only one thread to execute Python bytecode at a time. For IO-bound work (waiting on network/disk), threads release the GIL during the wait — multiple threads genuinely help. For CPU-bound work, threads fight over the GIL and don't help at all.

Explanation
Decision matrix
IO-bound CPU-bound (network, disk) (compute, parsing) threading ✓ works ✗ GIL blocks asyncio ✓ best choice ✗ single-threaded multiprocessing ✓ overkill ✓ bypasses GIL Rule: IO-bound → asyncio (if async libs exist) or threading CPU-bound → multiprocessing (ProcessPoolExecutor) Mixed → asyncio for IO + run_in_executor for CPU tasks Note: NumPy, pandas, and most C extensions release the GIL — threading can help with these even for "CPU" work.
ThreadPoolExecutor and ProcessPoolExecutor
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor # IO-bound — threads, GIL released during network wait with ThreadPoolExecutor(max_workers=10) as pool: results = list(pool.map(fetch, urls)) # CPU-bound — processes, each gets its own GIL def cpu_heavy(n): return sum(i * i for i in range(n)) with ProcessPoolExecutor(max_workers=4) as pool: results = list(pool.map(cpu_heavy, [10**6]*4))

Both executors expose the same map / submit API — switching from threads to processes is often just changing one class name.

Interview Q & A
Q: You need to fetch data from 500 external APIs as fast as possible. Would you use threading, asyncio, or multiprocessing?
A: Asyncio — this is IO-bound work. With asyncio.gather and an async HTTP client like aiohttp, 500 requests can be in-flight concurrently on a single thread with minimal overhead. Threading would also work (the GIL is released during network IO) but has higher memory overhead per thread and involves OS context switching. Multiprocessing is wrong here — it's for CPU-bound work, adds process spawn overhead, and the bottleneck is network latency, not CPU.

Async Python

Coroutines, the event loop, tasks, and writing correct async code

Why — motivation

Async Python is now mainstream — FastAPI, SQLAlchemy async, Redis async clients, and most modern ML serving frameworks are async by default. Understanding how the event loop, coroutines, and tasks actually work is essential for writing correct async code. Getting it wrong produces bugs that are extremely hard to debug: blocking the event loop freezes the entire server, not just one request.

Interviews for backend or ML engineering roles increasingly expect you to know the difference between a coroutine and a task, what await actually does, and how to run CPU-bound work without blocking the loop.

Intuition — the mental model

A coroutine is a function that can pause and resume. await pauses the current coroutine and gives control back to the event loop, which runs other ready coroutines. When the awaited operation completes, the event loop resumes the original coroutine. One thread, many coroutines, no OS context switching.

Think of the event loop as a restaurant manager: when a waiter (coroutine) is waiting for the kitchen (IO), the manager assigns them to another table (another coroutine). No waiter just stands idle — they all cooperate to keep things moving.

Explanation
Coroutines and the event loop
import asyncio async def greet(name, delay): await asyncio.sleep(delay) # yields control to the event loop print(f"Hello, {name}") # resumes here after delay # asyncio.run creates and tears down the event loop asyncio.run(greet("Alice", 1)) # Calling an async def returns a coroutine OBJECT — not the result coro = greet("Bob", 1) # nothing runs yet await coro # NOW it runs (inside an async context) # Two coroutines run concurrently — total time ~1s, not ~2s async def main(): await asyncio.gather( greet("Alice", 1), greet("Bob", 1), )

An async function called without await returns a coroutine object — it does not run. This is a common mistake: forgetting await means the operation never executes and no error is raised.

gather vs create_task
# asyncio.gather — run multiple coroutines concurrently, collect results results = await asyncio.gather( fetch(session, url1), fetch(session, url2), fetch(session, url3), return_exceptions=True, # exceptions become results, not crashes ) # asyncio.create_task — schedule immediately, don't wait yet # Useful when you want to start tasks and do other work first task1 = asyncio.create_task(process_chunk(chunk1)) task2 = asyncio.create_task(process_chunk(chunk2)) # ... do something else ... result1 = await task1 result2 = await task2 # Key difference: # gather runs coroutines starting from the first await # create_task schedules immediately — the task starts right now
Timeouts and cancellation
# Timeout — raises TimeoutError if coroutine takes too long try: result = await asyncio.wait_for(fetch(url), timeout=5.0) except asyncio.TimeoutError: result = fallback_value # Cancelling a task task = asyncio.create_task(long_running()) await asyncio.sleep(1) task.cancel() try: await task except asyncio.CancelledError: pass # handle cleanup if needed # asyncio.timeout (Python 3.11+) — cleaner context manager form async with asyncio.timeout(5.0): result = await fetch(url)
Async context managers and iterators
# Async context manager — __aenter__ / __aexit__ async with aiohttp.ClientSession() as session: async with session.get(url) as resp: data = await resp.json() # Async iterator — __aiter__ / __anext__ async for record in db.execute("SELECT * FROM events"): process(record) # Writing your own async context manager from contextlib import asynccontextmanager @asynccontextmanager async def managed_connection(dsn): conn = await asyncpg.connect(dsn) try: yield conn finally: await conn.close()
Running CPU-bound work without blocking the loop
# Blocking call inside async — WRONG — freezes the entire event loop async def bad(): result = cpu_heavy() # blocks all other coroutines while running # Correct — run in a thread/process pool, await the result import asyncio async def good(): loop = asyncio.get_event_loop() # run_in_executor uses ThreadPoolExecutor by default result = await loop.run_in_executor(None, cpu_heavy, arg) # Or use asyncio.to_thread (Python 3.9+) — cleaner async def better(): result = await asyncio.to_thread(cpu_heavy, arg)

Any synchronous blocking call inside an async function (file reads, time.sleep, CPU-bound computation, sync database drivers) freezes the entire event loop for its duration. Every other request stalls. Use asyncio.to_thread or run_in_executor to move it off the loop.

Interview Q & A
Q: What happens if you call a regular blocking function (like time.sleep(5)) inside an async function? How do you fix it?
A: The call blocks the entire event loop for 5 seconds — no other coroutine can run during that time. Every in-flight request stalls. This is the critical mistake in async Python: the event loop is single-threaded; blocking it blocks everything. Fix: replace with the async equivalent — await asyncio.sleep(5) for sleeping, await asyncio.to_thread(blocking_fn, arg) for any blocking function that can't be replaced. The to_thread approach runs the blocking call in a thread pool while the event loop continues serving other coroutines. For CPU-bound work, use ProcessPoolExecutor via run_in_executor instead, since threads still share the GIL.

Context managers

Guaranteed setup and teardown — the Pythonic way to manage any resource

Why — motivation

Context managers are how Python guarantees resource cleanup even when exceptions occur. Open files, database connections, locks, GPU memory, network sessions, temp directories — all should be managed with with. Without it, a single exception leaves resources leaked.

Writing your own context managers unlocks patterns that appear everywhere in production code: database transaction rollback, profiling blocks, suppressing specific exceptions, and async resource management. It's also a go-to interview question because the protocol is small but the design implications are large.

Intuition — the mental model

The with statement is a guaranteed try/finally with a cleaner name and a reusable protocol. It calls __enter__ on entry (setup, returns the resource) and __exit__ on exit (cleanup) — regardless of whether the block raised an exception.

Think of it as a contract: "I promise to clean up after myself, even if something goes wrong inside." The context manager holds that promise so every call site doesn't have to remember to.

Explanation
The protocol — __enter__ and __exit__
class ManagedResource: def __enter__(self): # setup — acquire the resource self.resource = acquire() return self.resource # value bound to the 'as' variable def __exit__(self, exc_type, exc_val, exc_tb): # cleanup — always called, even on exception release(self.resource) # return True → suppress the exception # return False/None → let it propagate (almost always this) return False with ManagedResource() as r: use(r) # equivalent to: # mgr = ManagedResource() # r = mgr.__enter__() # try: # use(r) # finally: # mgr.__exit__(*sys.exc_info())
@contextmanager — the simple way
from contextlib import contextmanager @contextmanager def managed_transaction(conn): try: yield conn.cursor() # code inside the with block runs here conn.commit() # only reached if no exception except Exception: conn.rollback() # on any exception, roll back raise # re-raise so caller knows it failed with managed_transaction(conn) as cursor: cursor.execute("INSERT INTO ...") # Timing blocks @contextmanager def timer(label): import time start = time.perf_counter() try: yield finally: print(f"{label}: {time.perf_counter() - start:.3f}s") with timer("data load"): df = pd.read_parquet("data.parquet")

Code before yield is __enter__. Code in finally after yield is __exit__. Much less boilerplate than writing a full class. The try/finally ensures cleanup even if the block raises.

contextlib utilities
from contextlib import suppress, nullcontext, ExitStack # suppress — swallow specific exceptions cleanly with suppress(FileNotFoundError): os.remove("temp.txt") # no error if file doesn't exist # nullcontext — placeholder when context may or may not exist def process(data, lock=None): with lock if lock is not None else nullcontext(): do_work(data) # ExitStack — dynamic number of context managers with ExitStack() as stack: files = [stack.enter_context(open(f)) for f in file_list] # all files closed on exit, even if some failed to open
Async context managers
from contextlib import asynccontextmanager @asynccontextmanager async def db_session(dsn): conn = await asyncpg.connect(dsn) try: yield conn finally: await conn.close() # guaranteed even on exception async with db_session(dsn) as conn: await conn.execute("SELECT 1") # Class-based async context manager class AsyncSession: async def __aenter__(self): self.conn = await connect() return self.conn async def __aexit__(self, *args): await self.conn.close()

Async context managers use async with and implement __aenter__ / __aexit__ (both async). The @asynccontextmanager decorator works the same as its sync equivalent but with async def and await inside.

Interview Q & A
Q: When would you return True from __exit__? Write a context manager that suppresses KeyError silently.
A: Returning True from __exit__ suppresses the exception — the with block exits cleanly as if nothing happened. Use it sparingly: only when swallowing the exception is genuinely correct behaviour, not as a lazy error-silencer. Example:

class SuppressKeyError:
    def __enter__(self): return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        return exc_type is KeyError

Or with contextlib: with suppress(KeyError): ... — which is the idiomatic way. The contextlib.suppress implementation does exactly this — checks exc_type and returns True if it matches.

Type hints

Annotations for documentation, tooling, and catching bugs before runtime

Why — motivation

Modern Python codebases — especially at FAANG and AI companies — use type hints extensively. They power IDE autocompletion, enable static analysis (mypy, pyright), and make function signatures self-documenting. They catch type errors at development time that would otherwise surface as cryptic runtime bugs.

Interviewers in Python-heavy roles increasingly expect annotated code and may ask about Optional, Protocol, or generics. Writing annotated code signals production hygiene even when a codebase doesn't strictly enforce types.

Intuition — the mental model

Type hints are annotations — Python itself ignores them at runtime entirely. They exist for humans and for tools. Think of them as a machine-readable docstring that tools can verify. Protocol is Python's structural typing — "any class that has these methods" — making duck typing explicit and tool-checkable.

Explanation
Basic annotations and Optional / Union
from typing import Optional def greet(name: str, repeat: int = 1) -> str: return name * repeat # Optional[X] == X | None — value or None def find(items: list[str], target: str) -> Optional[int]: try: return items.index(target) except ValueError: return None # Python 3.10+ union shorthand def process(value: int | str | None) -> str: return str(value) if value is not None else "" # Built-in generics (Python 3.9+ — no import needed) def summarise(data: list[float]) -> dict[str, float]: return {"mean": sum(data) / len(data)}
TypeVar and Protocol
from typing import TypeVar, Protocol from collections.abc import Sequence T = TypeVar("T") def first(items: Sequence[T]) -> T: # works for any sequence type return items[0] # Protocol — structural subtyping (duck typing, made explicit) class Drawable(Protocol): def draw(self) -> None: ... class Circle: def draw(self) -> None: # no inheritance needed print("circle") def render_all(shapes: list[Drawable]) -> None: for s in shapes: s.draw() render_all([Circle()]) # Circle satisfies Drawable structurally

Protocol is duck typing made tool-checkable. Any class with the required methods satisfies the protocol — no inheritance, no registration. This is how Python's own built-in protocols work: anything with __len__ is a Sized.

Callable, TypedDict, Final
from collections.abc import Callable from typing import TypedDict, Final def apply(fn: Callable[[int], int], value: int) -> int: return fn(value) # TypedDict — structured dict with known keys and types class UserRecord(TypedDict): id: int name: str email: str # Final — this variable should never be reassigned MAX_RETRIES: Final = 3
Interview Q & A
Q: Are Python type hints enforced at runtime? When would you use Protocol vs an ABC?
A: Type hints are not enforced at runtime — Python ignores them. They're metadata for tools (mypy, pyright) and humans. Use Protocol when you want structural typing — "any class with a draw() method" — without requiring callers to inherit from a specific base class. Use an ABC (Abstract Base Class) when you own all implementations, want to enforce a method contract, and want to provide shared default behaviour via concrete methods. Protocol is more flexible and better for library APIs where you can't control caller code; ABCs are better when you're designing an internal framework where you want the compiler to catch missing method implementations on subclasses.

Error handling

Failing safely — exceptions, exception chaining, and custom exception hierarchies

Why — motivation

Production code fails — network requests time out, files are missing, APIs return unexpected shapes. The difference between hobby code and production code is how failures are handled. Catching the wrong exception silently swallows bugs; catching too broadly hides problems; not chaining exceptions loses the root cause.

Custom exception hierarchies are how you build APIs where callers can handle errors at the right level of specificity — catching a module's base exception type without needing to know all its internal subtypes.

Intuition — the mental model

try/except separates the happy path from the failure path. The else block runs only when no exception occurred — it's for code that should run on success but shouldn't be inside try where its exceptions might be caught by the wrong handler. The finally block always runs — use it for guaranteed cleanup.

Explanation
try / except / else / finally
try: result = fetch_data(url) # operation that might fail except requests.Timeout: # most specific first result = retry(url) except (ValueError, KeyError) as e: # multiple types in a tuple raise DataError("bad response") from e # chain — preserves cause else: cache.store(result) # ONLY if no exception raised finally: metrics.increment("requests") # ALWAYS runs — even on re-raise # Exception hierarchy: # BaseException # SystemExit, KeyboardInterrupt ← don't catch these broadly # Exception # ValueError, TypeError, KeyError, AttributeError... # OSError → FileNotFoundError, PermissionError... # Never: except Exception: pass — silently swallows everything

Catch the most specific exception type you can handle. Use raise ... from e to chain exceptions so the original traceback isn't lost. The else block prevents a handler from masking errors from unrelated code.

Custom exception hierarchies
class PipelineError(Exception): """Base for all pipeline failures — callers catch this.""" class DataValidationError(PipelineError): def __init__(self, column, reason): self.column = column super().__init__(f"Column '{column}': {reason}") class SchemaError(PipelineError): ... class IngestionError(PipelineError): ... # Raise with context — preserves original traceback try: validate_schema(df) except KeyError as e: raise DataValidationError("user_id", "missing") from e # Callers catch at the right level of specificity: except DataValidationError: # this specific error except PipelineError: # any pipeline failure except Exception: # nuclear — only at top level

Define a base exception per module. This lets you change internal exception types without breaking callers who catch the base. Always include enough context in the message to debug without a full stacktrace.

Interview Q & A
Q: When would you use the else clause in a try/except block? What's the difference from putting that code at the end of the try block?
A: The else block runs only when the try block completed without raising an exception. Putting that code inside try means any exception it raises could be caught by the except clause — which is usually wrong. For example: fetch data in try, cache the result in else. A cache write failure propagates normally rather than being silently caught by the network exception handler. It also makes intent clearer: try is the risky operation, else is the success path, finally is cleanup that always runs.