Rolling Hash Pattern (Rabin-Karp)

1. Rabin-Karp String Matching Pattern

Component	Formula	Purpose	Complexity
Hash Function	`hash = Σ(char[i] * base^(n-1-i)) % mod`	Convert string to number	`O(n)` initial
Rolling Update	`hash = (hash - oldbase^(k-1)) base + new`	Slide window in O(1)	`O(1)` per char
Base	Prime number (e.g., 31, 101, 256)	Reduce collisions	Typically 31 or 256
Modulo	Large prime (e.g., 10^9+7, 10^9+9)	Prevent overflow	Essential for large strings

Example: Rabin-Karp pattern matching

function rabinKarp(text, pattern) {
    const n = text.length;
    const m = pattern.length;
    
    if (m > n) return [];
    
    const base = 256;  // Number of characters in alphabet
    const mod = 1e9 + 7;
    
    // Calculate base^(m-1) % mod
    let basePower = 1;
    for (let i = 0; i < m - 1; i++) {
        basePower = (basePower * base) % mod;
    }
    
    // Calculate hash of pattern and first window
    let patternHash = 0;
    let textHash = 0;
    
    for (let i = 0; i < m; i++) {
        patternHash = (patternHash * base + pattern.charCodeAt(i)) % mod;
        textHash = (textHash * base + text.charCodeAt(i)) % mod;
    }
    
    const result = [];
    
    // Slide window over text
    for (let i = 0; i <= n - m; i++) {
        // Check hash match
        if (patternHash === textHash) {
            // Verify actual characters (handle collisions)
            let match = true;
            for (let j = 0; j < m; j++) {
                if (text[i + j] !== pattern[j]) {
                    match = false;
                    break;
                }
            }
            if (match) result.push(i);
        }
        
        // Calculate next window hash (if not last window)
        if (i < n - m) {
            textHash = (textHash - text.charCodeAt(i) * basePower) % mod;
            textHash = (textHash * base + text.charCodeAt(i + m)) % mod;
            
            // Handle negative values
            if (textHash < 0) textHash += mod;
        }
    }
    
    return result;
}

// Example: text = "ababcababd", pattern = "abab"
// Pattern hash calculated once
// Text hash rolls through each position
// Matches at indices: [0, 5]

// Time: O(n + m) average case
// Worst case: O(nm) with many collisions

Example: Simplified rolling hash update

// Understanding the rolling hash formula:
// 
// Original hash for window [i, i+m-1]:
// hash = c[i]*b^(m-1) + c[i+1]*b^(m-2) + ... + c[i+m-1]*b^0
// 
// Next window [i+1, i+m]:
// 1. Remove first character: hash - c[i]*b^(m-1)
// 2. Shift left (multiply by b): (hash - c[i]*b^(m-1)) * b
// 3. Add new character: result + c[i+m]
// 
// new_hash = (old_hash - c[i]*b^(m-1)) * b + c[i+m]

function updateHash(oldHash, oldChar, newChar, basePower, base, mod) {
    // Remove contribution of first character
    let hash = (oldHash - oldChar * basePower) % mod;
    
    // Multiply by base (shift left)
    hash = (hash * base) % mod;
    
    // Add new character
    hash = (hash + newChar) % mod;
    
    // Handle negative values
    if (hash < 0) hash += mod;
    
    return hash;
}

// Example: "abc" -> "bcd" with base=10
// hash("abc") = 'a'*100 + 'b'*10 + 'c'*1 = 97*100 + 98*10 + 99
// Remove 'a': hash - 97*100
// Shift: (hash - 97*100) * 10
// Add 'd': result + 100
// hash("bcd") = 'b'*100 + 'c'*10 + 'd'*1

2. Repeated DNA Sequences Problem

Approach	Method	Time	Space
Rolling Hash + Set	Hash all 10-char substrings, track seen	`O(n)`	`O(n)`
Bit Encoding	2 bits per nucleotide, 20-bit integer	`O(n)`	`O(n)`
String Hash	Use string as key directly	`O(n * 10)`	`O(n)`
Optimization	Only 4 chars: A,C,G,T - perfect for bit encoding	Most efficient	Recommended

Example: Rolling hash approach

function findRepeatedDnaSequences(s) {
    if (s.length <= 10) return [];
    
    const seen = new Set();
    const repeated = new Set();
    const base = 4;  // 4 nucleotides
    const mod = 1e9 + 7;
    const L = 10;    // Sequence length
    
    // Map characters to numbers
    const charToNum = { 'A': 0, 'C': 1, 'G': 2, 'T': 3 };
    
    // Calculate base^(L-1)
    let basePower = 1;
    for (let i = 0; i < L - 1; i++) {
        basePower = (basePower * base) % mod;
    }
    
    // Calculate hash of first window
    let hash = 0;
    for (let i = 0; i < L; i++) {
        hash = (hash * base + charToNum[s[i]]) % mod;
    }
    seen.add(hash);
    
    // Rolling hash for remaining windows
    for (let i = L; i < s.length; i++) {
        // Remove first char, add new char
        hash = (hash - charToNum[s[i - L]] * basePower) % mod;
        hash = (hash * base + charToNum[s[i]]) % mod;
        if (hash < 0) hash += mod;
        
        if (seen.has(hash)) {
            repeated.add(s.substring(i - L + 1, i + 1));
        }
        seen.add(hash);
    }
    
    return Array.from(repeated);
}

// Example: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
// Result: ["AAAAACCCCC","CCCCCAAAAA"]

Example: Bit encoding optimization

function findRepeatedDnaBitEncoding(s) {
    if (s.length <= 10) return [];
    
    const seen = new Set();
    const repeated = new Set();
    const L = 10;
    
    // 2 bits per character: A=00, C=01, G=10, T=11
    const charToBits = { 'A': 0, 'C': 1, 'G': 2, 'T': 3 };
    
    // Build first window (20 bits total)
    let bitmask = 0;
    for (let i = 0; i < L; i++) {
        bitmask = (bitmask << 2) | charToBits[s[i]];
    }
    seen.add(bitmask);
    
    // Mask to keep only last 20 bits
    const mask = (1 << (L * 2)) - 1;
    
    for (let i = L; i < s.length; i++) {
        // Shift left 2, add new char, keep 20 bits
        bitmask = ((bitmask << 2) | charToBits[s[i]]) & mask;
        
        if (seen.has(bitmask)) {
            repeated.add(s.substring(i - L + 1, i + 1));
        }
        seen.add(bitmask);
    }
    
    return Array.from(repeated);
}

// Bit encoding is cleaner and faster:
// - No modulo operations
// - No multiplication
// - Just bit shifts and masks
// - 10 chars * 2 bits = 20 bits (fits in integer)

3. Longest Duplicate Substring Problem

Approach	Strategy	Time	Space
Binary Search + Hash	Binary search on length, hash to find duplicates	`O(n log n)`	`O(n)`
Suffix Array	Build suffix array, find LCP	`O(n log n)`	`O(n)`
Rolling Hash	For each length, check all substrings	`O(n^2)` naive	Not optimal alone
Optimal	Binary search + rolling hash combination	Best practical solution	Recommended

Example: Binary search with rolling hash

function longestDupSubstring(s) {
    const n = s.length;
    const base = 256;
    const mod = 2**53 - 1; // Large prime
    
    // Binary search on length
    let left = 1, right = n - 1;
    let result = "";
    
    while (left <= right) {
        const mid = Math.floor((left + right) / 2);
        const duplicate = searchDuplicate(s, mid, base, mod);
        
        if (duplicate !== null) {
            result = duplicate;
            left = mid + 1; // Try longer
        } else {
            right = mid - 1; // Try shorter
        }
    }
    
    return result;
}

function searchDuplicate(s, L, base, mod) {
    // Calculate base^L % mod
    let basePower = 1;
    for (let i = 0; i < L; i++) {
        basePower = (basePower * base) % mod;
    }
    basePower = (basePower / base) % mod;
    
    // Calculate hash of first window
    let hash = 0;
    for (let i = 0; i < L; i++) {
        hash = (hash * base + s.charCodeAt(i)) % mod;
    }
    
    const seen = new Map();
    seen.set(hash, 0);
    
    // Rolling hash
    for (let i = L; i < s.length; i++) {
        hash = (hash - s.charCodeAt(i - L) * basePower) % mod;
        hash = (hash * base + s.charCodeAt(i)) % mod;
        if (hash < 0) hash += mod;
        
        if (seen.has(hash)) {
            // Found duplicate, verify
            const start = seen.get(hash);
            const current = s.substring(i - L + 1, i + 1);
            const prev = s.substring(start, start + L);
            if (current === prev) {
                return current;
            }
        }
        seen.set(hash, i - L + 1);
    }
    
    return null;
}

// Example: s = "banana"
// Binary search tries lengths: 3, 2, 1
// Length 3: "ana" found
// Length 4: none
// Result: "ana"

// Time: O(n log n) - log n binary search, O(n) per search
// Space: O(n) for hash map

Note: The binary search optimization works because if there's a duplicate of length k, there's also a duplicate of length k-1 (monotonic property). This allows us to binary search on the length instead of trying all lengths sequentially.

4. Find All Anagrams in String

Approach	Method	Time	Space
Frequency Counter	Compare char frequencies in sliding window	`O(n)`	`O(1)` (26 letters)
Sorted Hash	Sort pattern and each window, compare	`O(n * m log m)`	Not efficient
Rolling Hash	Hash must be order-independent	`O(n)`	`O(1)`
Polynomial Hash	XOR or addition (commutative operations)	For anagrams	Order doesn't matter

Example: Frequency array sliding window

function findAnagrams(s, p) {
    const result = [];
    if (s.length < p.length) return result;
    
    const pCount = new Array(26).fill(0);
    const sCount = new Array(26).fill(0);
    
    // Count frequencies in pattern
    for (const char of p) {
        pCount[char.charCodeAt(0) - 'a'.charCodeAt(0)]++;
    }
    
    // First window
    for (let i = 0; i < p.length; i++) {
        sCount[s.charCodeAt(i) - 'a'.charCodeAt(0)]++;
    }
    
    if (arraysEqual(pCount, sCount)) {
        result.push(0);
    }
    
    // Sliding window
    for (let i = p.length; i < s.length; i++) {
        // Add new character
        sCount[s.charCodeAt(i) - 'a'.charCodeAt(0)]++;
        // Remove old character
        sCount[s.charCodeAt(i - p.length) - 'a'.charCodeAt(0)]--;
        
        if (arraysEqual(pCount, sCount)) {
            result.push(i - p.length + 1);
        }
    }
    
    return result;
}

function arraysEqual(a, b) {
    for (let i = 0; i < 26; i++) {
        if (a[i] !== b[i]) return false;
    }
    return true;
}

// Example: s = "cbaebabacd", p = "abc"
// Pattern count: [1,1,1,0,...,0]
// Result: [0,6] (indices of "cba" and "bac")

Example: Optimized with match counter

function findAnagramsOptimized(s, p) {
    const result = [];
    if (s.length < p.length) return result;
    
    const count = new Array(26).fill(0);
    
    // Difference array: positive = need more, negative = have extra
    for (const char of p) {
        count[char.charCodeAt(0) - 'a'.charCodeAt(0)]++;
    }
    
    let matches = 0;
    
    // First window
    for (let i = 0; i < p.length; i++) {
        const idx = s.charCodeAt(i) - 'a'.charCodeAt(0);
        count[idx]--;
        if (count[idx] === 0) matches++;
        else if (count[idx] === -1) matches--;
    }
    
    if (matches === 26) result.push(0);
    
    // Sliding window
    for (let i = p.length; i < s.length; i++) {
        // Add new character
        const newIdx = s.charCodeAt(i) - 'a'.charCodeAt(0);
        count[newIdx]--;
        if (count[newIdx] === 0) matches++;
        else if (count[newIdx] === -1) matches--;
        
        // Remove old character
        const oldIdx = s.charCodeAt(i - p.length) - 'a'.charCodeAt(0);
        count[oldIdx]++;
        if (count[oldIdx] === 0) matches++;
        else if (count[oldIdx] === 1) matches--;
        
        if (matches === 26) {
            result.push(i - p.length + 1);
        }
    }
    
    return result;
}

// Optimization: track how many characters have correct frequency
// Only compare one counter instead of 26 each time
// O(1) comparison vs O(26) = O(1)

5. Polynomial Rolling Hash Function

Property	Formula	Use Case	Characteristic
Standard Polynomial	`hash = Σ(c[i] * p^i) mod m`	String matching, order matters	Position-dependent
Commutative Hash	`hash = Σ(c[i]) or XOR(c[i])`	Anagrams, order doesn't matter	Position-independent
Multiple Hashes	Use 2+ hash functions with different bases/mods	Reduce collision probability	More reliable
Base Selection	Prime > alphabet size (31, 37, 101, 256)	Distribution quality	Critical choice

Example: Double hash to reduce collisions

class DoubleRollingHash {
    constructor(s, L) {
        this.s = s;
        this.L = L;
        
        // Two different hash functions
        this.base1 = 31;
        this.mod1 = 1e9 + 7;
        this.base2 = 37;
        this.mod2 = 1e9 + 9;
        
        // Precompute base powers
        this.basePower1 = 1;
        this.basePower2 = 1;
        for (let i = 0; i < L - 1; i++) {
            this.basePower1 = (this.basePower1 * this.base1) % this.mod1;
            this.basePower2 = (this.basePower2 * this.base2) % this.mod2;
        }
    }
    
    findDuplicates() {
        const seen = new Map();
        
        // Calculate initial hashes
        let hash1 = 0, hash2 = 0;
        for (let i = 0; i < this.L; i++) {
            const code = this.s.charCodeAt(i);
            hash1 = (hash1 * this.base1 + code) % this.mod1;
            hash2 = (hash2 * this.base2 + code) % this.mod2;
        }
        
        const key = `${hash1},${hash2}`;
        seen.set(key, 0);
        
        // Rolling hash
        for (let i = this.L; i < this.s.length; i++) {
            const oldCode = this.s.charCodeAt(i - this.L);
            const newCode = this.s.charCodeAt(i);
            
            // Update hash1
            hash1 = (hash1 - oldCode * this.basePower1) % this.mod1;
            hash1 = (hash1 * this.base1 + newCode) % this.mod1;
            if (hash1 < 0) hash1 += this.mod1;
            
            // Update hash2
            hash2 = (hash2 - oldCode * this.basePower2) % this.mod2;
            hash2 = (hash2 * this.base2 + newCode) % this.mod2;
            if (hash2 < 0) hash2 += this.mod2;
            
            const key = `${hash1},${hash2}`;
            if (seen.has(key)) {
                // Extremely low probability of false positive
                return this.s.substring(i - this.L + 1, i + 1);
            }
            seen.set(key, i - this.L + 1);
        }
        
        return null;
    }
}

// Probability of collision with double hash:
// P(collision) ≈ 1 / (mod1 * mod2) ≈ 10^-18
// Practically zero for reasonable input sizes

Example: Polynomial hash implementation details

// Standard polynomial rolling hash
class PolynomialHash {
    constructor(base = 31, mod = 1e9 + 7) {
        this.base = base;
        this.mod = mod;
    }
    
    // Compute hash of entire string
    computeHash(s) {
        let hash = 0;
        for (let i = 0; i < s.length; i++) {
            hash = (hash * this.base + s.charCodeAt(i)) % this.mod;
        }
        return hash;
    }
    
    // Compute hash with precomputed powers (more efficient)
    computeHashPowers(s, powers) {
        let hash = 0;
        const n = s.length;
        for (let i = 0; i < n; i++) {
            hash = (hash + s.charCodeAt(i) * powers[n - 1 - i]) % this.mod;
        }
        return hash;
    }
    
    // Precompute powers of base
    precomputePowers(maxLen) {
        const powers = [1];
        for (let i = 1; i < maxLen; i++) {
            powers[i] = (powers[i - 1] * this.base) % this.mod;
        }
        return powers;
    }
    
    // Rolling update: remove first, add last
    roll(hash, oldChar, newChar, basePower) {
        hash = (hash - oldChar * basePower) % this.mod;
        hash = (hash * this.base + newChar) % this.mod;
        if (hash < 0) hash += this.mod;
        return hash;
    }
}

// Usage:
const hasher = new PolynomialHash();
const powers = hasher.precomputePowers(1000);
const hash1 = hasher.computeHashPowers("hello", powers);
const hash2 = hasher.computeHashPowers("world", powers);

// Different strings -> different hashes (with high probability)

Warning: Hash collisions can occur! Always verify actual string equality when hashes match, especially in Rabin-Karp. Use double hashing (two different hash functions) to reduce collision probability to near-zero for most practical applications. Never rely on hash equality alone for correctness.

Summary: Rolling Hash Patterns

Core Formula: hash = Σ(char[i] × base^(n-1-i)) mod m - converts string to number for O(1) comparison
Rolling Update: new_hash = (old_hash - first_char × base^(L-1)) × base + new_char - O(1) window slide
Rabin-Karp: Pattern matching with rolling hash - O(n+m) average, O(nm) worst case with collisions
Base Selection: Use prime > alphabet size (31 for lowercase, 256 for ASCII) - reduces collisions
Modulo Selection: Large prime like 10^9+7 or 10^9+9 - prevents integer overflow while maintaining distribution
Collision Handling: When hashes match, verify actual strings - hash equality doesn't guarantee string equality
DNA Sequences: Only 4 chars (A,C,G,T) - can use 2-bit encoding instead of polynomial hash for efficiency
Bit Encoding: bitmask = (bitmask << 2 | newChar) & mask - simpler than polynomial, no modulo needed
Longest Duplicate: Binary search on length + rolling hash to check - O(n log n) time
Binary Search Insight: If duplicate of length k exists, duplicate of k-1 also exists - monotonic property enables binary search
Anagrams: Need order-independent hash - use frequency array or commutative hash (sum/XOR), not polynomial
Double Hashing: Use two hash functions with different base/mod - collision probability ≈ 1/(mod1×mod2) ≈ 10^-18
Precompute Powers: Calculate base^i for all i - avoids repeated exponentiation, O(1) hash computation
Applications: String matching, duplicate detection, substring search, plagiarism detection, DNA analysis

Related Cheat Sheets

Datastructures Algorithms