IA Operating Systems — Exam Cram

Everything you need for tomorrow, drawn from Prof. Mortier's 2024/25 slides and Tripos questions 2020–2025.

Processes & Scheduling Paging & Virtual Memory UNIX Protection I/O & File Systems CFS Buffering

START HERE

Topic Map & Exam Pattern

What Q3 and Q4 always test (2020–2025)

Paper 2 has two OS questions. Looking at past papers, the examiners cycle reliably through these clusters:

Year	Q3 Topic	Q4 Topic
2025	Access matrix / UNIX permissions	5-level page table, TLB EAT, trie comparison
2024	Context switch + CFS scheduling	Address binding, OPT/LRU/FIFO, Bélády's anomaly
2023	Interrupts, effective access time, page faults	Context switch, RR vs SJF turnaround, burst prediction
2022	Segmentation vs paging, 2-level page tables	File descriptors / capabilities, idle process defrag
2021	Paging calculations, i-nodes, disk accesses	FCFS/SJF/SRTF scheduling, UNIX permissions
2020	Buffering (single/double/circular), context switching

🎯 Highest-frequency topics — must know cold

Scheduling algorithms (FCFS, SJF, SRTF, RR, CFS) — numerical calculation
Paging arithmetic (page bits, page table sizes, TLB EAT)
Page replacement (OPT, LRU, FIFO with Bélády's anomaly)
UNIX permissions (rwx, setuid, groups, access matrix)
Context switching — what's saved, what's flushed
Buffering (single/double/circular) — capacity calculations

CPU

Processes & Context Switching

Process vs Program

Program: Static artifact on disk (text + data).
Process: Program in execution — dynamic. Has text, data, heap, and one or more threads. It is the unit of protection and resource allocation.
Thread: An execution context within a process. Each has its own PC, stack, and registers. In Linux, both processes and threads are "tasks".

Process States (Linux: 5 states)

new ──► Ready (R) ──► Running (R) ──► Terminated
               ▲    ◄── preempt ──┤
               │                  │
               │ wakeup     block on I/O
               │                  ▼
        Interruptible Sleep (S) / Uninterruptible Sleep (D)
               │
        SIGSTOP► Stopped (T) ◄─ SIGCONT

Process Control Block (PCB)

The OS data structure for a process. Contains:

PID — process identifier
State — running, ready, waiting…
CPU registers — PC, stack pointer, general registers (the "scheduling context")
Memory-management info — page table base register (PTBR), memory limits
Scheduling info — priority, nice value, vruntime (CFS)
I/O status — open file table pointer, allocated devices
Accounting — CPU time used

Context Switch

What must be saved (always)

All CPU registers: program counter (PC), stack pointer (SP), general-purpose registers, status/flags register. Saved to the PCB / Thread Control Block.

What must be flushed from hardware before the new process runs

TLB (Translation Lookaside Buffer) — must be flushed so the new process doesn't use stale translations from the old one. (Unless the hardware supports ASIDs — Address Space Identifiers — in which case only entries without the new process's ASID need invalidating.)

Why I/O buffers are NOT part of the saved context

I/O buffers live in main memory (the kernel address space), not in CPU registers. They persist independently of which process is running. Saving them with the context would make context switches prohibitively expensive and is unnecessary — the kernel manages them separately.

Tripos 2023 Q4(a) — 4 marks

What is a process's context and what are the main steps involved in context switching?

Model Answer

Context = the complete machine state needed to resume a process: CPU registers (PC, SP, general registers, flags), memory-management state (PTBR), and scheduling info.

Steps: (1) Triggered by interrupt/system call/timer expiry. (2) Save current process's registers to its PCB. (3) Run the scheduler to pick next process. (4) Flush TLB (or tag with ASIDs). (5) Load new process's registers from its PCB, including setting PTBR to its page table. (6) Return to user mode at the new PC.

fork / exec in Unix

fork() creates a child as an almost-exact copy of the parent (copy-on-write pages). Returns 0 in child, child's PID in parent.

exec() replaces the current process's address space with a new program. After exec, the process runs the new binary.

A zombie is a process that has exited but whose parent hasn't called wait(). An orphan is a process whose parent exited first.

CPU

Scheduling Algorithms

Key Metrics

Turnaround time: Completion time − Arrival time
Waiting time: Turnaround time − CPU burst time (time spent in Ready queue)
Response time: First run time − Arrival time

Algorithm Comparison

Algorithm	Pre-emptive?	Key property	Weakness
FCFS	No	Simple queue order	Convoy effect — short jobs stuck behind long ones
SJF	No	Optimal average waiting time (for given set)	Need to know burst length; starvation of long jobs
SRTF	Yes (preempt when shorter arrives)	Optimal with dynamic arrivals (if switch cost = 0)	Context switch overhead; still needs burst length
Round Robin	Yes (timer)	Fair: each process gets 1/n CPU in ≤ (n−1)q wait	Higher average turnaround than SJF; quantum choice critical
Priority	Optional	High-priority always first	Starvation — fix with aging
CFS	Yes	Proportional fairness via vruntime	Complex; needs target latency and min granularity tuning

Worked: FCFS / SJF / SRTF

Processes (from 2021 Q4): P1 arrives t=0, burst=8; P2 arrives t=3, burst=3; P3 arrives t=5, burst=4; P4 arrives t=6, burst=6.

Order by arrival: P1(0–8), P2(8–11), P3(11–15), P4(15–21)

P1 (0–8)

P4 (15–21)

Wait: P1=0, P2=5, P3=6, P4=9 → Avg wait = 5

At t=0 only P1 available → run P1. At t=8: P2(3), P3(4), P4(6) ready → pick P2 first (shortest). Order: P1(0–8), P2(8–11), P3(11–15), P4(15–21).

SJF is only better if shorter jobs arrive before long ones start.

Wait: P1=0, P2=5, P3=6, P4=9 → Avg wait = 5 (same here since P1 monopolises t=0–8)

Preempt current process if new arrival has shorter remaining burst.

t=0: P1(8) starts. t=3: P2(3) arrives, 3 < P1's remaining 5 → preempt. t=6: P2 finishes, P3(4) arrives, P4(6) arrives. Pick P3. t=7: P1 remains 5, P3 remains 3 — no preempt. t=10: P3 done. Pick P1(5), P4(6). P1 shorter. etc.

SRTF gives: P1(0–3), P2(3–6), P3(6–10), P1(10–15), P4(15–21)

Wait: P1=(3-0)+(10-3)... this gets complex — track remaining time carefully.

Wait_i = (Finish_i - Arrival_i) - Burst_i

Round Robin — key facts

    Each process gets at most q time units before being preempted and appended to the rear of the ready queue
With n processes and quantum q: max wait before first run = (n−1)×q
q too large → degenerates to FCFS
q too small → context switch overhead dominates
Typical q = 10–100 ms; context switch ≈ a few μs
To give a process more CPU without changing the scheduler: run multiple instances of it, or give it multiple tickets (lottery)

  

SJF Burst Prediction — Exponential Averaging

τ_n+1 = α × t_n + (1 − α) × τ_n

where t_n = actual nth burst, τ_n+1 = predicted next burst, α ∈ [0,1].

α ≈ 1: prediction ≈ last burst (history ignored)
α ≈ 0: prediction ≈ old estimate (recent ignored)
Good predictor when variance is small and doesn't change quickly

Tripos 2023 Q4(b) — RR vs SJF turnaround with context switch overhead

P1=10, P2=20, P3=30. All arrive simultaneously. Context switch = 1 unit. RR quantum = 2. When does SJF turnaround exceed RR?

Key Insight

RR: Each context switch costs 1 unit extra. With 3 processes and q=2, there are many switches — RR total turnaround is high but each process gets a share early.

SJF: Minimal context switches (n−1 = 2). But if burst lengths are unknown, the estimation overhead (10 units each here) can inflate total turnaround dramatically.

SJF exceeds RR when the context switch overhead in RR is outweighed by the estimation overhead in SJF — i.e., when quantum is small (many RR switches) or when burst estimation is cheap compared to long SJF estimation delays.

The most significant parameter is the time quantum q: smaller q → more RR context switches → higher RR turnaround → SJF looks better.

Linux

Completely Fair Scheduler (CFS)

Core idea

No fixed time slices. Each of N tasks should ideally get 1/N of the CPU. Weighted by nice value (−20 to +19; lower = higher priority = higher weight).

Time given to task j: t_j ∝ w_j / Σ w_i × target_latency

Target latency: The interval in which every runnable task should run at least once. E.g., 6ms with 3 equal-priority tasks → each gets 2ms.
Minimum granularity: Minimum time a process runs per scheduling turn, regardless of target latency. Prevents excessive switching when there are many tasks.
vruntime: Virtual run time: CFS picks the task with the smallest vruntime. For equal-priority tasks, vruntime = actual run time. Lower priority → vruntime grows faster (decays more).
Implementation: Red-black tree (O(log n) insert/remove). Leftmost (smallest vruntime) node is cached. This is the next task to run.

CFS Worked Example (2024 Q3)

5 CPU-bound processes arrive simultaneously: P1=10ms, P2=20ms, P3=30ms, P4=40ms, P5=50ms. All have default nice level (equal weight).

5 processes, target=60ms: each gets 60/5 = 12ms per round (≥ 2ms min granularity ✓).

Round 1 (t=0–60): P1 gets 12ms → P1 finishes at t=10. P2 gets 12ms → done at t=22 (since only needs 20ms, exits after 20ms). etc.

After each process finishes, remaining processes split the target latency.

This is preferable for batch workloads: larger quantum → fewer context switches → less overhead for CPU-bound work.

5 processes, target=20ms: 20/5 = 4ms per process. But min granularity = 5ms, so each gets 5ms per turn.

More frequent preemptions → better interactive response time but higher overhead.

Preferable for interactive workloads: shorter quantum → lower response latency.

Memory

Memory Management

Address Binding

Stage	What happens	Constraint
Compile time	Absolute addresses embedded in code	Must know load address in advance; recompile to move
Load time	Relocatable code patched by linker/loader with actual addresses	Fixed once loaded; can't move at runtime
Execution time	Hardware MMU translates logical→physical on every access	Process can move; requires MMU hardware support

Logical vs Physical Addresses

Logical (virtual) address: generated by the CPU. Physical address: seen by RAM. The MMU maps between them at runtime.

Segmentation

Logical address = <segment number, offset>. Each segment has a base and a limit. Supports sharing and variable-size logical units.

Problem: external fragmentation (variable-size segments leave gaps).

Fragmentation

External fragmentation: Total free memory is enough but it isn't contiguous. Fix: compaction (expensive — requires dynamic relocation).
Internal fragmentation: Allocated block is slightly larger than requested. Inherent with fixed-size pages/partitions. On average ½ frame per process.

Segmentation vs Paging

Implementing segments via paging wastes internal fragmentation: each segment is rounded up to a page boundary, leaving unused bytes at the end. The cost of alleviating this waste (reducing page size) is a larger page table.

Memory

Paging & Page Tables

Paging Basics

Physical memory divided into fixed-size frames. Logical memory divided into same-size pages. A page table maps page numbers to frame numbers.

page number (p)

offset (d)

← logical address

frame number (f)

offset (d)

← physical address

If logical address space = 2^m and page size = 2ⁿ: page number uses (m−n) bits, offset uses n bits.

Paging Arithmetic — How To Solve Any Question

Step 1: page offset bits = log₂(page size)

Step 2: frame number bits = log₂(physical memory / page size)

Step 3: page number bits = total virtual bits − offset bits

Step 4: number of PTEs = 2^{page-number-bits}

Step 5: PTE size = frame bits + protection bits (rounded up to byte boundary)

Step 6: total page table size = PTEs × PTE size

Worked: 2021 Q3(a)

Physical = 2³² bytes, Virtual = 2⁴⁸ bytes, Page = 2²⁰ bytes, 4 protection bits

(i) Frame number bits? Offset bits? (ii) Total page table size?

Solution

Offset bits = log₂(2²⁰) = 20 bits

Frame number bits = log₂(2³² / 2²⁰) = log₂(2¹²) = 12 bits

Page number bits = 48 − 20 = 28 bits → 2²⁸ PTEs

PTE size = 12 (frame) + 4 (protection) = 16 bits = 2 bytes

Page table total = 2²⁸ × 2 = 2²⁹ bytes = 512 MB

Multi-level Page Tables

Reason: a flat page table for a 32-bit system with 4KB pages needs 2²⁰ × 4 bytes = 4 MB contiguous in memory per process. Multi-level paging allows unused portions to be paged out.

2-level example (32-bit, 4KB pages): 20-bit page number split into 10-bit p1 + 10-bit p2.

PTBR → L1 page table (1024 entries × 4 bytes = 4KB) → L2 page table → frame.

Memory accesses: 1 (L1) + 1 (L2) + 1 (data) = 3 memory accesses per logical access (without TLB).

TLB and Effective Access Time (EAT)

EAT = hit_rate × (t_tlb + t_mem) + (1 − hit_rate) × (t_tlb + k × t_mem + t_mem)

where k = number of page table levels, t_tlb = TLB search time, t_mem = memory access time.

Simplified (1-level): EAT = h(t_tlb + t_mem) + (1−h)(t_tlb + 2×t_mem)

Worked: 2025 Q4(c)

t_mem = 40ns, TLB hit rate = 99%, t_tlb = 5ns

What is the EAT with a 5-level page table?

Solution

TLB hit: 5 + 40 = 45 ns

TLB miss (need 5 page table walks + data): 5 + 5×40 + 40 = 5 + 200 + 40 = 245 ns

EAT = 0.99 × 45 + 0.01 × 245 = 44.55 + 2.45 = 47 ns

2025 Q4: 5-level page table deep dive

57-bit virtual addressing, 4096-byte pages, 64-bit PTEs

Offset = log₂(4096) = 12 bits. Remaining = 57 − 12 = 45 bits → split across 5 levels = 9 bits per level.

Each page table level has 2⁹ = 512 entries × 8 bytes = 4096 bytes = exactly one page ← this is intentional!

Full page table (worst case, all 5 levels populated): 512⁴ × 512 entries... in practice each level-1 table is 4KB.

Total addressable memory = 2⁵⁷ bytes = 128 PB (pebibytes) = 128 × 2¹⁰ TB.

Binary trie comparison: A 57-level binary trie would have log₂(2⁵⁷) = 57 levels vs 5 for the 5-level page table. Each trie lookup touches 57 nodes vs 5 page table entries — much worse than the structured page table, even if O(log N).

Memory

Virtual Memory & Page Replacement

Demand Paging EAT Formula

EAT = (1 − p) × t_mem + p × t_fault

where p = page fault rate, t_mem = memory access time, t_fault = page fault service time.

Worked: 2023 Q3(b)

t_mem = 80ns, t_fault = 8ms (free frame available), target EAT ≤ 100ns

(i) Max permitted page-fault rate? (ii) If actual rate = 2× max, what's EAT? (iii) Half the faults occur with no free frame — new EAT?

Solution

(i) 100 = (1−p)×80 + p×8,000,000
100 = 80 − 80p + 8,000,000p → 20 = 7,999,920p → p ≤ 2.5 × 10⁻⁶

(ii) Actual p = 5 × 10⁻⁶:
EAT = (1 − 5×10⁻⁶)×80 + 5×10⁻⁶×8,000,000 ≈ 80 + 40 = 120 ns

(iii) Half faults need to evict a page — service time doubles for those:
EAT = 80 + 0.5p × 8,000,000 + 0.5p × 16,000,000 = 80 + p × 12,000,000
= 80 + 5×10⁻⁶ × 12,000,000 = 80 + 60 = 140 ns

Page Replacement Algorithms

Algorithm	Rule	Optimal?	Bélády's anomaly?
OPT	Replace page used farthest in future	Yes (oracle)	No (stack algorithm)
LRU	Replace least recently used	Good approximation	No (stack algorithm)
FIFO	Replace oldest loaded page	No	YES
Clock/2nd chance	FIFO but give referenced pages a second chance	Approx LRU	Possible

Bélády's Anomaly

With FIFO, adding more physical frames can sometimes increase the page fault rate. This happens because FIFO doesn't follow the "stack property" — the set of pages held with k frames is not always a subset of those held with k+1 frames.

OPT and LRU are stack algorithms — the k-frame set ⊆ (k+1)-frame set — so more frames never increases faults.

Worked: 2024 Q4(b) — OPT, LRU, FIFO on string 1 2 3 2 4 3 5 1 3 2 3 4

Reference string: 1 2 3 2 4 3 5 1 3 2 3 4 — wait, the paper uses 1 2 3 2 4 3 5 1 3 2 3 4. Let's work the paper's string: 1 2 3 2 4 3 5 1 3 2 3 4.

Actual paper: 1 2 3 2 4 3 5 1 3 2 3 4 with 3 frames. Track which frame holds which page, marking * = fault:

Ref:  1* 2* 3* 2  4* 3  5* 1* 3  2* 3  4*
OPT: [1] [1,2] [1,2,3] hit [2,3,4] hit [3,4,5] [1,4,5] hit [2,4,5] hit [2,4,3] → 8 faults
LRU: [1] [1,2] [1,2,3] hit [2,3,4] hit [3,4,5] [1,3,5] hit [1,2,3] hit [2,3,4] → 8 faults  
FIFO: [1] [1,2] [1,2,3] hit [2,3,4] hit [3,4,5] [1,4,5] hit [1,2,5] hit [2,3,5] [2,3,4] → ...
(Work through carefully, FIFO evicts the oldest loaded page each miss)

Always draw a table with one column per reference and show frame contents.

Emulating Reference & Dirty Bits Without Hardware Support

Reference bit emulation

Mark the page as no-access (invalid) in the PTE. When the process tries to access it, the MMU raises a trap. The OS trap handler: (1) sets the page to valid/readable again, (2) records that it was referenced. This is the "reference bit".

Dirty bit emulation

Mark the page as read-only even when it is writable. When the process tries to write, the MMU raises a protection fault. The OS trap handler: (1) marks the page writable, (2) records that it is dirty.

Thrashing & Working Set

Thrashing: a process doesn't have enough frames for its working set → constant page faults → CPU busy handling faults → OS detects low CPU utilisation → schedules more processes → even less memory per process → cascading failure.

Working set W(Δ): set of pages referenced in the last Δ memory references. Total demand D = Σ|WSS_i|. If D > available frames → suspend a process.

Protection

Protection, Security & UNIX Access Control

Dual-mode Operation

CPUs have at least two modes: user mode (restricted) and kernel mode (privileged). A mode bit tracks the current mode. Privileged instructions are only executable in kernel mode.

Mode switch to kernel: via a trap (system call, exception). The trap vector ensures control jumps to a fixed OS entry point — an application can't jump to arbitrary kernel code.

Access Matrix

Abstract model: rows = domains (subjects/principals), columns = objects, cells = permitted operations.

Representation	Description	Best for
Access Control Lists (ACLs)	Per-object: store list of (domain, rights) with each object. Column of the matrix.	File systems; "who can access this file?"
Capabilities	Per-domain: store list of (object, rights) with each subject. Row of the matrix. Unforgeable token.	Distributed systems; "what can this process do?"

For a modern personal laptop: use ACLs. The matrix is sparse (many files, few users). ACLs efficiently store only the non-empty entries per file, and are easy to check on file open. Capabilities require each process to carry all its access tokens, which is cumbersome for a large filesystem.

UNIX File Permissions

Every file/directory has: owner (user), group, and world ("other") — each with read/write/execute bits.

rw- r-- --- ← rw-r----- (owner=rw, group=r, world=none)

Permission check order: if owner → use owner bits; else if group member → use group bits; else use world bits.

Tripos 2021 Q4(d) — Classic UNIX permissions puzzle

group1=(user1,user2), group2=(user2,user3), group3=(user3,user1)
rw-rw---- user1 group1 file1
rw-r--r-- user2 group3 file2
rwxr----- user3 group2 file3

Analysis

user1 can read: file1 (owner, rw-), file2 (group3 member → r--) ✓, file3 (world=--- ✗, group2 member? No — group2={user2,user3}) → file1 and file2

user2 can write: file1 (group1 member, rw-) ✓, file2 (owner, rw-) ✓, file3 (group2 member, r-- only) ✗ → file1 and file2

Who can read file3: owner=user3 ✓ (rwx); group2={user2,user3}: user2 → r-- ✓; world --- ✗ → user3 and user2

For user2 to execute file3: user2 accesses via group2 (r--). Need x for group → rwxr-x--- (add x to group).

To execute as user3 (setuid): set the setuid bit: rwsr-x---. When user2 runs it, process runs with user3's UID.

UNIX File Descriptors as Capabilities

Ways FDs behave like capabilities

A process must possess an FD to use a resource (access check on open, not on each operation)
FDs can be passed to other processes (via sendmsg / Unix domain sockets), delegating access without sharing credentials
Each FD carries specific access rights (read-only, write-only, read-write)
They are opaque handles — the process can't forge one

Ways FDs differ from true capabilities

FDs are integers into a per-process table — not unforgeable in the classical sense (can be guessed by index)
The kernel manages them; processes can't create them at will without the OS
Revocation: closing an FD doesn't revoke other processes' copies of the same open file description
They don't survive process termination (unless explicitly preserved)

Redirecting stdout — kernel interactions (2022 Q4(a))

Normal: process inherits fd 1 = stdout (terminal). Redirect to file:

open("file", O_WRONLY|O_CREAT|O_TRUNC) — creates/truncates the file, returns fd N
dup2(N, 1) — closes fd 1 and makes it an alias for fd N (same open file description)
close(N) — clean up the extra fd

Pipe vs file redirect: a pipe is more efficient because data flows directly in kernel memory from producer to consumer without hitting disk. File redirect needs: write to disk, then read from disk — two disk I/O operations plus filesystem overhead.

I/O

I/O Subsystem & Buffering

I/O Methods

Method	How it works	When to use
Polling	CPU busy-loops checking device status register	Very fast devices (latency < few μs); otherwise wasteful
Interrupts	Device raises interrupt line; CPU saves state, calls ISR	Infrequently accessed devices; most I/O
DMA	Device controller moves data directly to/from RAM; one interrupt per block	High-speed bulk transfers (disk, network)

Interrupt Handling (2023 Q3(a))

How interrupts are handled in a modern UNIX-like OS:

Device raises interrupt-request line after each instruction boundary
CPU saves registers, switches to kernel mode, indexes interrupt vector
Top half (ISR): runs with interrupts disabled — minimal work (acknowledge device, copy data to kernel buffer)
Bottom half: runs later with interrupts enabled — heavier processing (protocol handling, waking waiting processes)
Scheduler may run after the bottom half; returns to interrupted process or a higher-priority one

Traps as software interrupts: a trap is generated by an instruction (system call, page fault, division by zero) rather than a hardware device, but the mechanism is identical — save state, index a vector, jump to a handler. It is "software-generated" but handled like an interrupt.

Buffering (Tripos 2020 Q4)

Strategy	Description	Throughput (burst ≤ 5MB/s, 2s)
Single buffer (1 MB)	One buffer, can't simultaneously read and write → application must wait while buffer is being processed	Application stalls during each burst; data loss likely during 5MB/s bursts
Double buffer (2 × 512 KB)	One buffer fills while other drains → overlap I/O and processing	Better, but 512KB × 2 = 1MB total: a 5MB/s × 2s = 10MB burst still overflows
Circular buffer (N × 512 KB)	Ring of buffers; producer and consumer advance pointers independently	Needs enough capacity to absorb burst: 5 MB/s × 2s − 1 MB/s × 2s = 8 MB extra → need ≥ 8MB + margin

Tripos 2020 Q4(d) — Circular buffer size calculation

How many 512 KB buffers needed to prevent data loss in a circular buffer?

Solution

During a burst: data arrives at 5 MB/s, consumed at 1 MB/s → net accumulation = 4 MB/s.

Burst lasts 2 seconds → excess data = 4 MB/s × 2s = 8 MB.

Number of 512 KB buffers = 8 MB / 0.5 MB = 16 buffers minimum.

(Add 1–2 extra in practice to avoid race conditions between reader and writer pointers.)

Simultaneously read/writable buffer (2020 Q4(e))

Why proposed: allow producer to write into a buffer while consumer reads from it, increasing concurrency.

Challenges: data consistency — consumer could read partially-written data; need locks or atomic operations; potential race conditions.

Simpler approach: use a circular buffer with enough capacity — avoids the concurrency complexity entirely while providing the same overlap of I/O and processing.

Storage

File Systems & Storage

UNIX i-nodes

An i-node (index node) is the FCB in UNIX. It stores metadata (owner, permissions, timestamps, size) and pointers to data blocks.

Standard layout: 12 direct pointers + 1 single-indirect + 1 double-indirect + 1 triple-indirect.

If block size = 4 KB and each disk address = 4 bytes → one indirect block = 1024 pointers.

Max file size = (12 + 1024 + 1024² + 1024³) × 4 KB ≈ 4 TB

Worked: Disk Accesses to Read a File (2021 Q3(b))

Read /home/user1/test/test1.html (4 disk blocks). Root inode in memory. Each inode and directory fits in 1 block.

(ii) How many disk accesses to read test1.html?

Path traversal: /home/user1/test/test1.html

Root inode in memory → read root directory block (1) → find "home" inode number → read home inode (2) → read home dir block (3) → find "user1" inode → read user1 inode (4) → read user1 dir block (5) → find "test" inode → read test inode (6) → read test dir block (7) → find "test1.html" inode → read test1.html inode (8) → read 4 data blocks (9, 10, 11, 12) = 12 disk accesses

(iii) test2.html immediately after: "test" directory inode and block are now cached → skip those 2. Start from test dir cached → find test2.html inode → read inode (1) → read 4 data blocks (2,3,4,5) = 5 disk accesses

Hard Links vs Symbolic Links

Hard link: Another directory entry pointing to the same inode. Reference-counted. Deleting either entry decrements count; file deleted when count = 0. Can't span filesystems (different inode number spaces). Can't link directories (prevents cycles).
Symbolic (soft) link: A special file whose data is a path. Can cross filesystems and point to directories. Dangling if target deleted.

Idle Process and Defragmentation (2022 Q4(c))

Three disadvantages of using idle process for defragmentation

Increased latency for real work: disk I/O during defragmentation competes with I/O needed for interactive/waiting processes, increasing their response time.
SSDs don't benefit / are harmed: modern storage (SSDs, NVMe) has no seek latency, so fragmentation is irrelevant. Unnecessarily writing wears out flash cells.
Interrupted defragmentation is dangerous: a crash mid-defragmentation can corrupt the filesystem if block moves are not journalled/atomic. It also makes meaningful progress only if idle periods are long enough to complete moves.

Practice

Worked Past-Paper Examples

2025 Q3 — Access Matrix Construction

root, alice, bob, chris — printer, backup disk, webcam+mic, speakers

Build the access matrix for read/write on peripherals per the given policy.

Access Matrix

Domain	Printer	Backup Disk	Webcam/Mic	Speakers
root	r, w	w (create)	r, w	r, w
alice	r, w	—	r, w	r, w
bob	r (status only)	r (recover)	r, w	r, w
chris	—	—	r (speakers only = play music)	r, w

Note: "status check" maps to read. "Fully use printer" = read+write. "Participate in video conference" = read+write on webcam/mic. "Play music" = write to speakers (output), or read from speakers — interpret as write-only access.

2024 Q3 — CFS Schedule

5 processes: 10, 20, 30, 40, 50ms. CFS target=60ms, min gran=2ms

Describe the schedule and count context switches.

Round 1 (t = 0–60 ms): 5 equal-weight tasks, each gets 60/5 = 12 ms

P1(10ms): runs 10ms then done at t=10. P2(20ms): runs 12ms → 8ms remaining. P3: runs 12ms → 18ms left. P4: runs 12ms → 28ms left. P5: runs 12ms → 38ms left.

After P1 finishes at t=10, the scheduler rebalances target latency across 4 remaining tasks = 15ms each for remaining time.

Work through carefully with vruntime: P1 finishes first (least CPU used), so next round P2 has lowest vruntime and runs.

Context switches: one per scheduling event. With target=60ms and 5 tasks all equal priority, roughly ~5 switches in round 1, fewer in subsequent rounds as tasks complete.

2024 Q4 — Page Replacement on 1 2 3 2 4 3 5 1 3 2 3 4

3 frames. Reference string: 1 2 3 2 4 3 5 1 3 2 3 4

Trace OPT, LRU, and FIFO.

FIFO Trace (track insertion order)

Ref:   1  2  3  2  4  3  5  1  3  2  3  4
       *  *  *  -  *  -  *  *  -  *  -  *
Frames after each step:
[1]  [1,2]  [1,2,3]  [1,2,3]  [2,3,4]  [2,3,4]
[3,4,5]  [4,5,1]  [4,5,1]  [5,1,2]  [5,1,2]  [1,2,4]
FIFO faults: 1,2,3,4,5,1,2,4 = 8 faults (marked *)

LRU Trace (track last use time)

Ref:   1  2  3  2  4  3  5  1  3  2  3  4
       *  *  *  -  *  -  *  *  -  *  -  *
At fault, evict page used longest ago:
t=4: frames=[1,2,3], ref=4 → evict 1 (LRU) → [2,3,4]
t=6: frames=[2,3,4], ref=5 → evict 2 → [3,4,5]  
t=7: frames=[3,4,5], ref=1 → evict 4 → [3,5,1]
t=9: frames=[3,5,1], ref=2 → evict 5 → [3,1,2]
t=11: frames=[3,1,2], ref=4 → evict 1 → [3,2,4]
LRU faults = 8

OPT Trace (evict page used farthest in future)

t=4: frames=[1,2,3], ref=4. Next uses: 1→t=7, 2→t=9, 3→t=5. Evict 2 (farthest). → [1,3,4]
t=6: miss on 5? No — check: frames after t=4: [1,3,4]. Ref 3: hit. Ref 5 at t=6: [1,3,4]→evict 4 → [1,3,5]
t=7: ref=1: hit. t=8: ref=3: hit. t=9: ref=2: miss → evict 5 → [1,3,2]. t=11: ref=4 → evict 1 → [3,2,4]
OPT faults: 1,2,3,4,5,2,4 = 7 faults

⚡ Reference

Quick Reference — Formulas & Facts

Key Formulas

EAT (no page fault) = h(t_tlb + t_mem) + (1−h)(t_tlb + k·t_mem + t_mem)
EAT (with page fault) = (1−p)·t_mem + p·t_fault
Circular buffer capacity = (burst_rate − consume_rate) × burst_duration
CFS slice = (w_j / Σ w_i) × target_latency  [≥ min_granularity]
Exponential avg: τ_{n+1} = α·t_n + (1−α)·τ_n
HDD avg latency = 30 / RPM (seconds)
Turnaround = completion − arrival;  Wait = turnaround − burst

UNIX Permission Patterns

Goal	Permission / Approach
Owner rw, group r, world none	`rw-r-----` = 640
All can read, owner can write	`rw-r--r--` = 644
Execute as file owner	setuid bit: `rwsr-xr-x`
Execute as file group	setgid bit: `rwxr-sr-x`
root-only rw, others none	`rw-------` = 600, owned by root

Process Creation in UNIX

pid = fork();          // clone process; COW pages
if (pid == 0) {        // child
    execve("/bin/prog", argv, envp);
} else {               // parent
    wait(&status);
}

Common Gotchas in Exams

Context switch: remember to flush TLB (unless ASIDs). IO buffers NOT saved in context.
RR turnaround: don't forget context switch overhead if stated (e.g., 2023 Q4 includes +1 per switch).
Bélády's: only FIFO can exhibit it; OPT and LRU cannot — they are stack algorithms.
Page table size: for a 32-bit system with 4-byte PTEs and 4KB pages = 4MB per process.
2-level page table benefit: most PTEs are unused (sparse virtual address space) — unused L1 entries don't need an L2 table → saves memory vs flat table.
SJF is not necessarily better than RR: with context switch overhead and unknown bursts, RR can win.
UNIX permissions: check order is owner → group → world; process uses the first matching category.
Setuid: process runs with the file owner's UID, not the executing user's UID.
Pipes vs file redirect: pipes avoid disk entirely; file redirect requires two disk I/Os.
Circular buffer for bursts: capacity must absorb net excess = (burst − consume) × burst_time.

Lecture Topics vs Exam Mapping

Lecture	Content	Recent Tripos coverage
01 Introduction	OS role, interrupts, storage hierarchy, buffering	2020 (buffering), 2023 (interrupts)
02 Protection	Dual-mode, system calls, ACLs, capabilities	2022, 2025 (access matrix)
03 Processes	PCB, context switch, fork/exec, IPC	2023, 2024 Q3
04–05 Scheduling	FCFS, SJF, SRTF, RR, priority, CFS	Every year
06 Memory Management	Address binding, segmentation, fragmentation	2022, 2024
07 Paging	Page tables, TLB, multi-level paging	2021, 2022, 2025
08 Virtual Memory	Demand paging, OPT/LRU/FIFO, thrashing	2023, 2024
09 I/O Systems	Polling, interrupts, DMA, buffering	2020, 2022, 2023
10 Storage & Files	i-nodes, directories, disk scheduling	2021, 2022
11–12 UNIX Case Study	CFS, file permissions, VFS, shell	2021, 2024, 2025

Last-Minute Checklist

I can calculate TLB EAT for any number of page table levels
I can trace OPT, LRU, FIFO page replacement on a reference string
I know why FIFO suffers Bélády's anomaly but OPT/LRU don't
I can calculate average waiting time for FCFS, SJF, SRTF, RR
I know what the context switch saves and what must be flushed (TLB)
I can build a UNIX access matrix from policy statements
I can work out UNIX read/write/execute for any user given rwx + groups
I understand setuid and what it does
I can calculate circular buffer capacity for burst scenarios
I can describe how demand paging EAT changes with page fault rate
I understand CFS: vruntime, target latency, minimum granularity
I know the structure of a UNIX i-node and how to count disk accesses for path traversal
I can explain why pipes are more efficient than file-based IPC
I understand the two representations of the access matrix and when to prefer each