Reservoir Sampling

1. Random Pick from Stream

Approach	Method	Probability	Space
Reservoir Sampling	Keep one item, replace with probability 1/i	Each item 1/n	`O(1)`
Store All (Naive)	Collect all items, pick random index	Uniform	`O(n)`
Algorithm	For i-th item, keep with probability 1/i	Proven uniform	Key insight
K Items	Maintain k items, replace with k/i probability	Extension	`O(k)`

Example: Single item reservoir sampling

class ReservoirSampler {
    constructor() {
        this.result = null;
        this.count = 0;
    }
    
    pick(value) {
        this.count++;
        
        // Pick current item with probability 1/count
        if (Math.random() < 1 / this.count) {
            this.result = value;
        }
        
        return this.result;
    }
}

// Example stream: 1, 2, 3, 4, 5
// Item 1: count=1, keep with prob 1/1 = 100%
// Item 2: count=2, keep with prob 1/2 = 50%
// Item 3: count=3, keep with prob 1/3 = 33.3%
// Item 4: count=4, keep with prob 1/4 = 25%
// Item 5: count=5, keep with prob 1/5 = 20%

// Each item has final probability 1/5 of being selected

// Proof for item 1:
// P(item 1 selected) = P(kept at 1) * P(not replaced at 2,3,4,5)
//                    = 1 * (1 - 1/2) * (1 - 1/3) * (1 - 1/4) * (1 - 1/5)
//                    = 1 * 1/2 * 2/3 * 3/4 * 4/5
//                    = 1/5 ✓

Example: Alternative with random index

class ReservoirSamplerAlt {
    constructor() {
        this.result = null;
        this.count = 0;
    }
    
    pick(value) {
        this.count++;
        
        // Generate random index in [0, count)
        const randomIndex = Math.floor(Math.random() * this.count);
        
        // If random index is 0, replace result
        if (randomIndex === 0) {
            this.result = value;
        }
        
        return this.result;
    }
}

// Equivalent to previous: randomIndex === 0 has probability 1/count
// More intuitive for understanding extension to k items

2. Linked List Random Node

Approach	Preprocessing	Query Time	Space
Array Conversion	Convert to array once	`O(1)`	`O(n)`
Reservoir Sampling	None	`O(n)`	`O(1)`
Count + Random	Count length	`O(n)`	`O(1)`
Trade-off	Space vs time	Depends on use case	Design decision

Example: Reservoir sampling approach

class LinkedListRandom {
    constructor(head) {
        this.head = head;
    }
    
    getRandom() {
        let result = null;
        let current = this.head;
        let count = 0;
        
        // Traverse list, apply reservoir sampling
        while (current !== null) {
            count++;
            
            // Pick current with probability 1/count
            if (Math.random() < 1 / count) {
                result = current.val;
            }
            
            current = current.next;
        }
        
        return result;
    }
}

// Example: 1 -> 2 -> 3
// Each call to getRandom():
// - Visit all 3 nodes: O(n)
// - Each node has 1/3 probability
// - O(1) space (no array needed)

Example: Array conversion approach

class LinkedListRandomArray {
    constructor(head) {
        this.values = [];
        let current = head;
        
        // Convert to array once
        while (current !== null) {
            this.values.push(current.val);
            current = current.next;
        }
    }
    
    getRandom() {
        const randomIndex = Math.floor(
            Math.random() * this.values.length
        );
        return this.values[randomIndex];
    }
}

// Example: 1 -> 2 -> 3
// Constructor: O(n) time, O(n) space
// getRandom(): O(1) time
// 
// Better if:
// - Multiple calls to getRandom()
// - Space is not a constraint
// - Fast query time needed

3. Random Pick with Weight

Approach	Method	Preprocessing	Query Time
Prefix Sum + Binary Search	Build cumulative weight array	`O(n)`	`O(log n)`
Linear Scan	Generate random, scan until found	`O(1)`	`O(n)`
Alias Method	Complex preprocessing for O(1) query	`O(n)`	`O(1)`
Recommended	Prefix sum + binary search	Good balance	Most practical

Example: Prefix sum with binary search

class RandomPickWeight {
    constructor(weights) {
        this.prefixSum = [];
        let sum = 0;
        
        // Build prefix sum array
        for (const weight of weights) {
            sum += weight;
            this.prefixSum.push(sum);
        }
        
        this.totalWeight = sum;
    }
    
    pickIndex() {
        // Generate random number in [1, totalWeight]
        const target = Math.random() * this.totalWeight;
        
        // Binary search for first prefix sum >= target
        let left = 0;
        let right = this.prefixSum.length - 1;
        
        while (left < right) {
            const mid = Math.floor((left + right) / 2);
            
            if (this.prefixSum[mid] < target) {
                left = mid + 1;
            } else {
                right = mid;
            }
        }
        
        return left;
    }
}

// Example: weights = [1, 3, 2]
// prefixSum = [1, 4, 6]
// totalWeight = 6
// 
// Random ranges:
// [0, 1)   -> index 0 (weight 1, probability 1/6)
// [1, 4)   -> index 1 (weight 3, probability 3/6)
// [4, 6)   -> index 2 (weight 2, probability 2/6)
// 
// Binary search finds the right bucket efficiently

Example: Visualizing the probability distribution

// Example: weights = [1, 3, 2]
// 
// Visual representation:
// Index 0: |X|
// Index 1: |X|X|X|
// Index 2: |X|X|
// 
// Number line [0, 6]:
// [0, 1) -> 0
// [1, 4) -> 1
// [4, 6) -> 2
// 
// prefixSum[i] represents the right boundary (exclusive) for index i

function pickIndexExplained(weights) {
    const prefixSum = [];
    let sum = 0;
    
    for (const w of weights) {
        sum += w;
        prefixSum.push(sum);
    }
    
    // Random in [0, sum)
    const target = Math.random() * sum;
    
    // Find first index where prefixSum[i] > target
    // This is lower_bound in C++ or bisect_left in Python
    for (let i = 0; i < prefixSum.length; i++) {
        if (target < prefixSum[i]) {
            return i;
        }
    }
    
    return prefixSum.length - 1;
}

// Linear search version - O(n) per query
// Binary search reduces to O(log n)

4. Shuffle Array Fisher Yates

Approach	Method	Time	Correctness
Fisher-Yates	Iterate, swap with random remaining element	`O(n)`	Proven uniform
Sort with Random Keys	Assign random number, sort by it	`O(n log n)`	Uniform but slower
Naive Swap	Swap with any random index (wrong!)	`O(n)`	NOT uniform!
Direction	Forward or backward - both valid	Same complexity	Personal preference

Example: Fisher-Yates shuffle

function shuffle(array) {
    const n = array.length;
    
    // Iterate from end to beginning
    for (let i = n - 1; i > 0; i--) {
        // Pick random index from [0, i]
        const j = Math.floor(Math.random() * (i + 1));
        
        // Swap array[i] with array[j]
        [array[i], array[j]] = [array[j], array[i]];
    }
    
    return array;
}

// Example: [1, 2, 3, 4]
// 
// i=3: swap with random in [0,3]
// i=2: swap with random in [0,2]
// i=1: swap with random in [0,1]
// 
// Each element has equal probability at each position
// Total permutations: n! all equally likely

// Why [0, i] not [0, n-1]?
// - Ensures element at i is randomly chosen from remaining
// - Elements already processed are in final positions
// - Generates uniform distribution

Example: Forward version

function shuffleForward(array) {
    const n = array.length;
    
    // Iterate from beginning to end
    for (let i = 0; i < n - 1; i++) {
        // Pick random index from [i, n-1]
        const j = i + Math.floor(
            Math.random() * (n - i)
        );
        
        // Swap
        [array[i], array[j]] = [array[j], array[i]];
    }
    
    return array;
}

// Equivalent to backward version
// At step i:
// - Elements [0, i-1] are finalized
// - Pick random from [i, n-1] for position i
// - Continue for remaining positions

Example: Shuffle class for reset functionality

class ShuffleArray {
    constructor(nums) {
        this.original = [...nums];
        this.array = [...nums];
    }
    
    reset() {
        this.array = [...this.original];
        return this.array;
    }
    
    shuffle() {
        const n = this.array.length;
        
        for (let i = n - 1; i > 0; i--) {
            const j = Math.floor(Math.random() * (i + 1));
            [this.array[i], this.array[j]] = 
                [this.array[j], this.array[i]];
        }
        
        return this.array;
    }
}

// Usage:
// const shuffler = new ShuffleArray([1,2,3,4]);
// shuffler.shuffle(); // Random permutation
// shuffler.reset();   // Back to [1,2,3,4]
// shuffler.shuffle(); // Different random permutation

Warning: Common mistake - swapping with any random index (not just remaining) produces biased distribution! Example: for 3 elements, naive swap generates 3^3=27 outcomes but only 3!=6 permutations, so some permutations are more likely than others. Always use Fisher-Yates!

5. Sample K Items Uniformly

Approach	Method	Time	Space
Reservoir Sampling K	Maintain k items, replace with probability k/i	`O(n)`	`O(k)`
Partial Shuffle	Fisher-Yates for first k positions	`O(k)`	`O(n)`
Random Selection	Pick k random unique indices	`O(k)` expected	`O(k)`
Stream Unknown Size	Only reservoir sampling works	One pass	Critical use case

Example: Reservoir sampling for k items

function reservoirSampleK(stream, k) {
    const reservoir = [];
    
    for (let i = 0; i < stream.length; i++) {
        if (i < k) {
            // First k elements - add directly
            reservoir.push(stream[i]);
        } else {
            // For i >= k, include with probability k/i
            const j = Math.floor(Math.random() * (i + 1));
            
            if (j < k) {
                // Replace element at index j
                reservoir[j] = stream[i];
            }
        }
    }
    
    return reservoir;
}

// Example: stream = [1,2,3,4,5,6,7,8,9,10], k = 3
// 
// i=0,1,2: reservoir = [1,2,3]
// i=3: j in [0,3], if j<3 replace reservoir[j] with 4
// i=4: j in [0,4], if j<3 replace reservoir[j] with 5
// ...
// i=9: j in [0,9], if j<3 replace reservoir[j] with 10
// 
// Each subset of size k has equal probability C(n,k)^-1

// Proof sketch:
// P(item i in final reservoir) = k/n for all i
// P(item i selected) = P(initially) * P(not replaced)
//                    = k/(i+1) * product((i+1-k)/(j+1) for j in i+1..n-1)
//                    = k/n

Example: Partial Fisher-Yates

function sampleKPartialShuffle(array, k) {
    const n = array.length;
    const result = [...array]; // Copy to avoid modifying original
    
    // Only shuffle first k positions
    for (let i = 0; i < k; i++) {
        // Pick random from [i, n-1]
        const j = i + Math.floor(Math.random() * (n - i));
        [result[i], result[j]] = [result[j], result[i]];
    }
    
    // First k elements are the sample
    return result.slice(0, k);
}

// Example: array = [1,2,3,4,5,6,7,8,9,10], k = 3
// 
// i=0: swap arr[0] with random in [0,9]
// i=1: swap arr[1] with random in [1,9]
// i=2: swap arr[2] with random in [2,9]
// 
// Return [arr[0], arr[1], arr[2]]
// 
// Advantage: Requires knowing array size
// Disadvantage: O(n) space to copy array
// Time: O(k) only k swaps needed

Example: Optimized with index set

function sampleKIndexSet(array, k) {
    const n = array.length;
    const selected = new Set();
    const result = [];
    
    while (result.length < k) {
        const index = Math.floor(Math.random() * n);
        
        if (!selected.has(index)) {
            selected.add(index);
            result.push(array[index]);
        }
    }
    
    return result;
}

// Simple but has issues:
// - Expected O(k) when k << n
// - Degrades to O(n) when k approaches n
// - Birthday paradox: collisions increase with k
// 
// Better alternatives:
// - Use reservoir sampling for streams
// - Use partial shuffle for known-size arrays
// - This approach OK for small k relative to n

Note: Reservoir sampling is essential when stream size is unknown or when you can only make a single pass through the data (e.g., reading from network, file too large for memory). For arrays of known size where multiple passes are allowed, partial Fisher-Yates shuffle is often more intuitive and efficient.

Summary: Reservoir Sampling Patterns

Core Algorithm: For i-th element, keep it with probability 1/i - each element ultimately has probability 1/n of being selected
Single Item: Maintain one result, replace with current if Math.random() < 1/count - O(1) space, O(n) time per query
Proof Intuition: P(item 1 selected) = P(kept) × P(not replaced) = 1 × (1-1/2) × (1-1/3) × ... × (1-1/n) = 1/n
Linked List Random: Either reservoir sample each query O(n) time, or convert to array once O(n) space for O(1) queries
Weighted Random: Build prefix sum array, generate random in [0, totalWeight), binary search for first prefix >= target - O(log n) query
Prefix Sum Insight: Each index gets range proportional to weight, random number falls into range with correct probability
Fisher-Yates Shuffle: Iterate i from n-1 to 1, swap arr[i] with random in [0, i] - generates uniform random permutation
Shuffle Direction: Backward (i = n-1 to 1) or forward (i = 0 to n-2) both work - swap with random from remaining elements
Common Mistake: Swapping with any index (not just remaining) produces biased distribution - 3^n outcomes but n! permutations
K Items Reservoir: Fill first k, then for i ≥ k pick random j in [0, i], if j < k replace reservoir[j] - each item has probability k/n
Partial Shuffle: Run Fisher-Yates for first k positions only, return first k elements - O(k) time vs full O(n) shuffle
Use Cases: Reservoir for unknown size streams, partial shuffle for known arrays, weighted for non-uniform distributions
Stream vs Array: Reservoir sampling essential for single-pass streaming data, shuffle/index methods better for random access arrays

Related Cheatsheets

Datastructures-algorithms