Reservoir Sampling

1. Random Pick from Stream

Approach Method Probability Space
Reservoir Sampling Keep one item, replace with probability 1/i Each item 1/n O(1)
Store All (Naive) Collect all items, pick random index Uniform O(n)
Algorithm For i-th item, keep with probability 1/i Proven uniform Key insight
K Items Maintain k items, replace with k/i probability Extension O(k)

Example: Single item reservoir sampling

class ReservoirSampler {
    constructor() {
        this.result = null;
        this.count = 0;
    }
    
    pick(value) {
        this.count++;
        
        // Pick current item with probability 1/count
        if (Math.random() < 1 / this.count) {
            this.result = value;
        }
        
        return this.result;
    }
}

// Example stream: 1, 2, 3, 4, 5
// Item 1: count=1, keep with prob 1/1 = 100%
// Item 2: count=2, keep with prob 1/2 = 50%
// Item 3: count=3, keep with prob 1/3 = 33.3%
// Item 4: count=4, keep with prob 1/4 = 25%
// Item 5: count=5, keep with prob 1/5 = 20%

// Each item has final probability 1/5 of being selected

// Proof for item 1:
// P(item 1 selected) = P(kept at 1) * P(not replaced at 2,3,4,5)
//                    = 1 * (1 - 1/2) * (1 - 1/3) * (1 - 1/4) * (1 - 1/5)
//                    = 1 * 1/2 * 2/3 * 3/4 * 4/5
//                    = 1/5 ✓

Example: Alternative with random index

class ReservoirSamplerAlt {
    constructor() {
        this.result = null;
        this.count = 0;
    }
    
    pick(value) {
        this.count++;
        
        // Generate random index in [0, count)
        const randomIndex = Math.floor(Math.random() * this.count);
        
        // If random index is 0, replace result
        if (randomIndex === 0) {
            this.result = value;
        }
        
        return this.result;
    }
}

// Equivalent to previous: randomIndex === 0 has probability 1/count
// More intuitive for understanding extension to k items

2. Linked List Random Node

Approach Preprocessing Query Time Space
Array Conversion Convert to array once O(1) O(n)
Reservoir Sampling None O(n) O(1)
Count + Random Count length O(n) O(1)
Trade-off Space vs time Depends on use case Design decision

Example: Reservoir sampling approach

class LinkedListRandom {
    constructor(head) {
        this.head = head;
    }
    
    getRandom() {
        let result = null;
        let current = this.head;
        let count = 0;
        
        // Traverse list, apply reservoir sampling
        while (current !== null) {
            count++;
            
            // Pick current with probability 1/count
            if (Math.random() < 1 / count) {
                result = current.val;
            }
            
            current = current.next;
        }
        
        return result;
    }
}

// Example: 1 -> 2 -> 3
// Each call to getRandom():
// - Visit all 3 nodes: O(n)
// - Each node has 1/3 probability
// - O(1) space (no array needed)

Example: Array conversion approach

class LinkedListRandomArray {
    constructor(head) {
        this.values = [];
        let current = head;
        
        // Convert to array once
        while (current !== null) {
            this.values.push(current.val);
            current = current.next;
        }
    }
    
    getRandom() {
        const randomIndex = Math.floor(
            Math.random() * this.values.length
        );
        return this.values[randomIndex];
    }
}

// Example: 1 -> 2 -> 3
// Constructor: O(n) time, O(n) space
// getRandom(): O(1) time
// 
// Better if:
// - Multiple calls to getRandom()
// - Space is not a constraint
// - Fast query time needed

3. Random Pick with Weight

Approach Method Preprocessing Query Time
Prefix Sum + Binary Search Build cumulative weight array O(n) O(log n)
Linear Scan Generate random, scan until found O(1) O(n)
Alias Method Complex preprocessing for O(1) query O(n) O(1)
Recommended Prefix sum + binary search Good balance Most practical
class RandomPickWeight {
    constructor(weights) {
        this.prefixSum = [];
        let sum = 0;
        
        // Build prefix sum array
        for (const weight of weights) {
            sum += weight;
            this.prefixSum.push(sum);
        }
        
        this.totalWeight = sum;
    }
    
    pickIndex() {
        // Generate random number in [1, totalWeight]
        const target = Math.random() * this.totalWeight;
        
        // Binary search for first prefix sum >= target
        let left = 0;
        let right = this.prefixSum.length - 1;
        
        while (left < right) {
            const mid = Math.floor((left + right) / 2);
            
            if (this.prefixSum[mid] < target) {
                left = mid + 1;
            } else {
                right = mid;
            }
        }
        
        return left;
    }
}

// Example: weights = [1, 3, 2]
// prefixSum = [1, 4, 6]
// totalWeight = 6
// 
// Random ranges:
// [0, 1)   -> index 0 (weight 1, probability 1/6)
// [1, 4)   -> index 1 (weight 3, probability 3/6)
// [4, 6)   -> index 2 (weight 2, probability 2/6)
// 
// Binary search finds the right bucket efficiently

Example: Visualizing the probability distribution

// Example: weights = [1, 3, 2]
// 
// Visual representation:
// Index 0: |X|
// Index 1: |X|X|X|
// Index 2: |X|X|
// 
// Number line [0, 6]:
// [0, 1) -> 0
// [1, 4) -> 1
// [4, 6) -> 2
// 
// prefixSum[i] represents the right boundary (exclusive) for index i

function pickIndexExplained(weights) {
    const prefixSum = [];
    let sum = 0;
    
    for (const w of weights) {
        sum += w;
        prefixSum.push(sum);
    }
    
    // Random in [0, sum)
    const target = Math.random() * sum;
    
    // Find first index where prefixSum[i] > target
    // This is lower_bound in C++ or bisect_left in Python
    for (let i = 0; i < prefixSum.length; i++) {
        if (target < prefixSum[i]) {
            return i;
        }
    }
    
    return prefixSum.length - 1;
}

// Linear search version - O(n) per query
// Binary search reduces to O(log n)

4. Shuffle Array Fisher Yates

Approach Method Time Correctness
Fisher-Yates Iterate, swap with random remaining element O(n) Proven uniform
Sort with Random Keys Assign random number, sort by it O(n log n) Uniform but slower
Naive Swap Swap with any random index (wrong!) O(n) NOT uniform!
Direction Forward or backward - both valid Same complexity Personal preference

Example: Fisher-Yates shuffle

function shuffle(array) {
    const n = array.length;
    
    // Iterate from end to beginning
    for (let i = n - 1; i > 0; i--) {
        // Pick random index from [0, i]
        const j = Math.floor(Math.random() * (i + 1));
        
        // Swap array[i] with array[j]
        [array[i], array[j]] = [array[j], array[i]];
    }
    
    return array;
}

// Example: [1, 2, 3, 4]
// 
// i=3: swap with random in [0,3]
// i=2: swap with random in [0,2]
// i=1: swap with random in [0,1]
// 
// Each element has equal probability at each position
// Total permutations: n! all equally likely

// Why [0, i] not [0, n-1]?
// - Ensures element at i is randomly chosen from remaining
// - Elements already processed are in final positions
// - Generates uniform distribution

Example: Forward version

function shuffleForward(array) {
    const n = array.length;
    
    // Iterate from beginning to end
    for (let i = 0; i < n - 1; i++) {
        // Pick random index from [i, n-1]
        const j = i + Math.floor(
            Math.random() * (n - i)
        );
        
        // Swap
        [array[i], array[j]] = [array[j], array[i]];
    }
    
    return array;
}

// Equivalent to backward version
// At step i:
// - Elements [0, i-1] are finalized
// - Pick random from [i, n-1] for position i
// - Continue for remaining positions

Example: Shuffle class for reset functionality

class ShuffleArray {
    constructor(nums) {
        this.original = [...nums];
        this.array = [...nums];
    }
    
    reset() {
        this.array = [...this.original];
        return this.array;
    }
    
    shuffle() {
        const n = this.array.length;
        
        for (let i = n - 1; i > 0; i--) {
            const j = Math.floor(Math.random() * (i + 1));
            [this.array[i], this.array[j]] = 
                [this.array[j], this.array[i]];
        }
        
        return this.array;
    }
}

// Usage:
// const shuffler = new ShuffleArray([1,2,3,4]);
// shuffler.shuffle(); // Random permutation
// shuffler.reset();   // Back to [1,2,3,4]
// shuffler.shuffle(); // Different random permutation
Warning: Common mistake - swapping with any random index (not just remaining) produces biased distribution! Example: for 3 elements, naive swap generates 3^3=27 outcomes but only 3!=6 permutations, so some permutations are more likely than others. Always use Fisher-Yates!

5. Sample K Items Uniformly

Approach Method Time Space
Reservoir Sampling K Maintain k items, replace with probability k/i O(n) O(k)
Partial Shuffle Fisher-Yates for first k positions O(k) O(n)
Random Selection Pick k random unique indices O(k) expected O(k)
Stream Unknown Size Only reservoir sampling works One pass Critical use case

Example: Reservoir sampling for k items

function reservoirSampleK(stream, k) {
    const reservoir = [];
    
    for (let i = 0; i < stream.length; i++) {
        if (i < k) {
            // First k elements - add directly
            reservoir.push(stream[i]);
        } else {
            // For i >= k, include with probability k/i
            const j = Math.floor(Math.random() * (i + 1));
            
            if (j < k) {
                // Replace element at index j
                reservoir[j] = stream[i];
            }
        }
    }
    
    return reservoir;
}

// Example: stream = [1,2,3,4,5,6,7,8,9,10], k = 3
// 
// i=0,1,2: reservoir = [1,2,3]
// i=3: j in [0,3], if j<3 replace reservoir[j] with 4
// i=4: j in [0,4], if j<3 replace reservoir[j] with 5
// ...
// i=9: j in [0,9], if j<3 replace reservoir[j] with 10
// 
// Each subset of size k has equal probability C(n,k)^-1

// Proof sketch:
// P(item i in final reservoir) = k/n for all i
// P(item i selected) = P(initially) * P(not replaced)
//                    = k/(i+1) * product((i+1-k)/(j+1) for j in i+1..n-1)
//                    = k/n

Example: Partial Fisher-Yates

function sampleKPartialShuffle(array, k) {
    const n = array.length;
    const result = [...array]; // Copy to avoid modifying original
    
    // Only shuffle first k positions
    for (let i = 0; i < k; i++) {
        // Pick random from [i, n-1]
        const j = i + Math.floor(Math.random() * (n - i));
        [result[i], result[j]] = [result[j], result[i]];
    }
    
    // First k elements are the sample
    return result.slice(0, k);
}

// Example: array = [1,2,3,4,5,6,7,8,9,10], k = 3
// 
// i=0: swap arr[0] with random in [0,9]
// i=1: swap arr[1] with random in [1,9]
// i=2: swap arr[2] with random in [2,9]
// 
// Return [arr[0], arr[1], arr[2]]
// 
// Advantage: Requires knowing array size
// Disadvantage: O(n) space to copy array
// Time: O(k) only k swaps needed

Example: Optimized with index set

function sampleKIndexSet(array, k) {
    const n = array.length;
    const selected = new Set();
    const result = [];
    
    while (result.length < k) {
        const index = Math.floor(Math.random() * n);
        
        if (!selected.has(index)) {
            selected.add(index);
            result.push(array[index]);
        }
    }
    
    return result;
}

// Simple but has issues:
// - Expected O(k) when k << n
// - Degrades to O(n) when k approaches n
// - Birthday paradox: collisions increase with k
// 
// Better alternatives:
// - Use reservoir sampling for streams
// - Use partial shuffle for known-size arrays
// - This approach OK for small k relative to n
Note: Reservoir sampling is essential when stream size is unknown or when you can only make a single pass through the data (e.g., reading from network, file too large for memory). For arrays of known size where multiple passes are allowed, partial Fisher-Yates shuffle is often more intuitive and efficient.

Summary: Reservoir Sampling Patterns

  • Core Algorithm: For i-th element, keep it with probability 1/i - each element ultimately has probability 1/n of being selected
  • Single Item: Maintain one result, replace with current if Math.random() < 1/count - O(1) space, O(n) time per query
  • Proof Intuition: P(item 1 selected) = P(kept) × P(not replaced) = 1 × (1-1/2) × (1-1/3) × ... × (1-1/n) = 1/n
  • Linked List Random: Either reservoir sample each query O(n) time, or convert to array once O(n) space for O(1) queries
  • Weighted Random: Build prefix sum array, generate random in [0, totalWeight), binary search for first prefix >= target - O(log n) query
  • Prefix Sum Insight: Each index gets range proportional to weight, random number falls into range with correct probability
  • Fisher-Yates Shuffle: Iterate i from n-1 to 1, swap arr[i] with random in [0, i] - generates uniform random permutation
  • Shuffle Direction: Backward (i = n-1 to 1) or forward (i = 0 to n-2) both work - swap with random from remaining elements
  • Common Mistake: Swapping with any index (not just remaining) produces biased distribution - 3^n outcomes but n! permutations
  • K Items Reservoir: Fill first k, then for i ≥ k pick random j in [0, i], if j < k replace reservoir[j] - each item has probability k/n
  • Partial Shuffle: Run Fisher-Yates for first k positions only, return first k elements - O(k) time vs full O(n) shuffle
  • Use Cases: Reservoir for unknown size streams, partial shuffle for known arrays, weighted for non-uniform distributions
  • Stream vs Array: Reservoir sampling essential for single-pass streaming data, shuffle/index methods better for random access arrays