Rolling Hash Pattern (Rabin-Karp)

1. Rabin-Karp String Matching Pattern

Component Formula Purpose Complexity
Hash Function hash = Σ(char[i] * base^(n-1-i)) % mod Convert string to number O(n) initial
Rolling Update hash = (hash - old*base^(k-1)) * base + new Slide window in O(1) O(1) per char
Base Prime number (e.g., 31, 101, 256) Reduce collisions Typically 31 or 256
Modulo Large prime (e.g., 10^9+7, 10^9+9) Prevent overflow Essential for large strings

Example: Rabin-Karp pattern matching

function rabinKarp(text, pattern) {
    const n = text.length;
    const m = pattern.length;
    
    if (m > n) return [];
    
    const base = 256;  // Number of characters in alphabet
    const mod = 1e9 + 7;
    
    // Calculate base^(m-1) % mod
    let basePower = 1;
    for (let i = 0; i < m - 1; i++) {
        basePower = (basePower * base) % mod;
    }
    
    // Calculate hash of pattern and first window
    let patternHash = 0;
    let textHash = 0;
    
    for (let i = 0; i < m; i++) {
        patternHash = (patternHash * base + pattern.charCodeAt(i)) % mod;
        textHash = (textHash * base + text.charCodeAt(i)) % mod;
    }
    
    const result = [];
    
    // Slide window over text
    for (let i = 0; i <= n - m; i++) {
        // Check hash match
        if (patternHash === textHash) {
            // Verify actual characters (handle collisions)
            let match = true;
            for (let j = 0; j < m; j++) {
                if (text[i + j] !== pattern[j]) {
                    match = false;
                    break;
                }
            }
            if (match) result.push(i);
        }
        
        // Calculate next window hash (if not last window)
        if (i < n - m) {
            textHash = (textHash - text.charCodeAt(i) * basePower) % mod;
            textHash = (textHash * base + text.charCodeAt(i + m)) % mod;
            
            // Handle negative values
            if (textHash < 0) textHash += mod;
        }
    }
    
    return result;
}

// Example: text = "ababcababd", pattern = "abab"
// Pattern hash calculated once
// Text hash rolls through each position
// Matches at indices: [0, 5]

// Time: O(n + m) average case
// Worst case: O(nm) with many collisions

Example: Simplified rolling hash update

// Understanding the rolling hash formula:
// 
// Original hash for window [i, i+m-1]:
// hash = c[i]*b^(m-1) + c[i+1]*b^(m-2) + ... + c[i+m-1]*b^0
// 
// Next window [i+1, i+m]:
// 1. Remove first character: hash - c[i]*b^(m-1)
// 2. Shift left (multiply by b): (hash - c[i]*b^(m-1)) * b
// 3. Add new character: result + c[i+m]
// 
// new_hash = (old_hash - c[i]*b^(m-1)) * b + c[i+m]

function updateHash(oldHash, oldChar, newChar, basePower, base, mod) {
    // Remove contribution of first character
    let hash = (oldHash - oldChar * basePower) % mod;
    
    // Multiply by base (shift left)
    hash = (hash * base) % mod;
    
    // Add new character
    hash = (hash + newChar) % mod;
    
    // Handle negative values
    if (hash < 0) hash += mod;
    
    return hash;
}

// Example: "abc" -> "bcd" with base=10
// hash("abc") = 'a'*100 + 'b'*10 + 'c'*1 = 97*100 + 98*10 + 99
// Remove 'a': hash - 97*100
// Shift: (hash - 97*100) * 10
// Add 'd': result + 100
// hash("bcd") = 'b'*100 + 'c'*10 + 'd'*1

2. Repeated DNA Sequences Problem

Approach Method Time Space
Rolling Hash + Set Hash all 10-char substrings, track seen O(n) O(n)
Bit Encoding 2 bits per nucleotide, 20-bit integer O(n) O(n)
String Hash Use string as key directly O(n * 10) O(n)
Optimization Only 4 chars: A,C,G,T - perfect for bit encoding Most efficient Recommended

Example: Rolling hash approach

function findRepeatedDnaSequences(s) {
    if (s.length <= 10) return [];
    
    const seen = new Set();
    const repeated = new Set();
    const base = 4;  // 4 nucleotides
    const mod = 1e9 + 7;
    const L = 10;    // Sequence length
    
    // Map characters to numbers
    const charToNum = { 'A': 0, 'C': 1, 'G': 2, 'T': 3 };
    
    // Calculate base^(L-1)
    let basePower = 1;
    for (let i = 0; i < L - 1; i++) {
        basePower = (basePower * base) % mod;
    }
    
    // Calculate hash of first window
    let hash = 0;
    for (let i = 0; i < L; i++) {
        hash = (hash * base + charToNum[s[i]]) % mod;
    }
    seen.add(hash);
    
    // Rolling hash for remaining windows
    for (let i = L; i < s.length; i++) {
        // Remove first char, add new char
        hash = (hash - charToNum[s[i - L]] * basePower) % mod;
        hash = (hash * base + charToNum[s[i]]) % mod;
        if (hash < 0) hash += mod;
        
        if (seen.has(hash)) {
            repeated.add(s.substring(i - L + 1, i + 1));
        }
        seen.add(hash);
    }
    
    return Array.from(repeated);
}

// Example: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
// Result: ["AAAAACCCCC","CCCCCAAAAA"]

Example: Bit encoding optimization

function findRepeatedDnaBitEncoding(s) {
    if (s.length <= 10) return [];
    
    const seen = new Set();
    const repeated = new Set();
    const L = 10;
    
    // 2 bits per character: A=00, C=01, G=10, T=11
    const charToBits = { 'A': 0, 'C': 1, 'G': 2, 'T': 3 };
    
    // Build first window (20 bits total)
    let bitmask = 0;
    for (let i = 0; i < L; i++) {
        bitmask = (bitmask << 2) | charToBits[s[i]];
    }
    seen.add(bitmask);
    
    // Mask to keep only last 20 bits
    const mask = (1 << (L * 2)) - 1;
    
    for (let i = L; i < s.length; i++) {
        // Shift left 2, add new char, keep 20 bits
        bitmask = ((bitmask << 2) | charToBits[s[i]]) & mask;
        
        if (seen.has(bitmask)) {
            repeated.add(s.substring(i - L + 1, i + 1));
        }
        seen.add(bitmask);
    }
    
    return Array.from(repeated);
}

// Bit encoding is cleaner and faster:
// - No modulo operations
// - No multiplication
// - Just bit shifts and masks
// - 10 chars * 2 bits = 20 bits (fits in integer)

3. Longest Duplicate Substring Problem

Approach Strategy Time Space
Binary Search + Hash Binary search on length, hash to find duplicates O(n log n) O(n)
Suffix Array Build suffix array, find LCP O(n log n) O(n)
Rolling Hash For each length, check all substrings O(n^2) naive Not optimal alone
Optimal Binary search + rolling hash combination Best practical solution Recommended

Example: Binary search with rolling hash

function longestDupSubstring(s) {
    const n = s.length;
    const base = 256;
    const mod = 2**53 - 1; // Large prime
    
    // Binary search on length
    let left = 1, right = n - 1;
    let result = "";
    
    while (left <= right) {
        const mid = Math.floor((left + right) / 2);
        const duplicate = searchDuplicate(s, mid, base, mod);
        
        if (duplicate !== null) {
            result = duplicate;
            left = mid + 1; // Try longer
        } else {
            right = mid - 1; // Try shorter
        }
    }
    
    return result;
}

function searchDuplicate(s, L, base, mod) {
    // Calculate base^L % mod
    let basePower = 1;
    for (let i = 0; i < L; i++) {
        basePower = (basePower * base) % mod;
    }
    basePower = (basePower / base) % mod;
    
    // Calculate hash of first window
    let hash = 0;
    for (let i = 0; i < L; i++) {
        hash = (hash * base + s.charCodeAt(i)) % mod;
    }
    
    const seen = new Map();
    seen.set(hash, 0);
    
    // Rolling hash
    for (let i = L; i < s.length; i++) {
        hash = (hash - s.charCodeAt(i - L) * basePower) % mod;
        hash = (hash * base + s.charCodeAt(i)) % mod;
        if (hash < 0) hash += mod;
        
        if (seen.has(hash)) {
            // Found duplicate, verify
            const start = seen.get(hash);
            const current = s.substring(i - L + 1, i + 1);
            const prev = s.substring(start, start + L);
            if (current === prev) {
                return current;
            }
        }
        seen.set(hash, i - L + 1);
    }
    
    return null;
}

// Example: s = "banana"
// Binary search tries lengths: 3, 2, 1
// Length 3: "ana" found
// Length 4: none
// Result: "ana"

// Time: O(n log n) - log n binary search, O(n) per search
// Space: O(n) for hash map
Note: The binary search optimization works because if there's a duplicate of length k, there's also a duplicate of length k-1 (monotonic property). This allows us to binary search on the length instead of trying all lengths sequentially.

4. Find All Anagrams in String

Approach Method Time Space
Frequency Counter Compare char frequencies in sliding window O(n) O(1) (26 letters)
Sorted Hash Sort pattern and each window, compare O(n * m log m) Not efficient
Rolling Hash Hash must be order-independent O(n) O(1)
Polynomial Hash XOR or addition (commutative operations) For anagrams Order doesn't matter

Example: Frequency array sliding window

function findAnagrams(s, p) {
    const result = [];
    if (s.length < p.length) return result;
    
    const pCount = new Array(26).fill(0);
    const sCount = new Array(26).fill(0);
    
    // Count frequencies in pattern
    for (const char of p) {
        pCount[char.charCodeAt(0) - 'a'.charCodeAt(0)]++;
    }
    
    // First window
    for (let i = 0; i < p.length; i++) {
        sCount[s.charCodeAt(i) - 'a'.charCodeAt(0)]++;
    }
    
    if (arraysEqual(pCount, sCount)) {
        result.push(0);
    }
    
    // Sliding window
    for (let i = p.length; i < s.length; i++) {
        // Add new character
        sCount[s.charCodeAt(i) - 'a'.charCodeAt(0)]++;
        // Remove old character
        sCount[s.charCodeAt(i - p.length) - 'a'.charCodeAt(0)]--;
        
        if (arraysEqual(pCount, sCount)) {
            result.push(i - p.length + 1);
        }
    }
    
    return result;
}

function arraysEqual(a, b) {
    for (let i = 0; i < 26; i++) {
        if (a[i] !== b[i]) return false;
    }
    return true;
}

// Example: s = "cbaebabacd", p = "abc"
// Pattern count: [1,1,1,0,...,0]
// Result: [0,6] (indices of "cba" and "bac")

Example: Optimized with match counter

function findAnagramsOptimized(s, p) {
    const result = [];
    if (s.length < p.length) return result;
    
    const count = new Array(26).fill(0);
    
    // Difference array: positive = need more, negative = have extra
    for (const char of p) {
        count[char.charCodeAt(0) - 'a'.charCodeAt(0)]++;
    }
    
    let matches = 0;
    
    // First window
    for (let i = 0; i < p.length; i++) {
        const idx = s.charCodeAt(i) - 'a'.charCodeAt(0);
        count[idx]--;
        if (count[idx] === 0) matches++;
        else if (count[idx] === -1) matches--;
    }
    
    if (matches === 26) result.push(0);
    
    // Sliding window
    for (let i = p.length; i < s.length; i++) {
        // Add new character
        const newIdx = s.charCodeAt(i) - 'a'.charCodeAt(0);
        count[newIdx]--;
        if (count[newIdx] === 0) matches++;
        else if (count[newIdx] === -1) matches--;
        
        // Remove old character
        const oldIdx = s.charCodeAt(i - p.length) - 'a'.charCodeAt(0);
        count[oldIdx]++;
        if (count[oldIdx] === 0) matches++;
        else if (count[oldIdx] === 1) matches--;
        
        if (matches === 26) {
            result.push(i - p.length + 1);
        }
    }
    
    return result;
}

// Optimization: track how many characters have correct frequency
// Only compare one counter instead of 26 each time
// O(1) comparison vs O(26) = O(1)

5. Polynomial Rolling Hash Function

Property Formula Use Case Characteristic
Standard Polynomial hash = Σ(c[i] * p^i) mod m String matching, order matters Position-dependent
Commutative Hash hash = Σ(c[i]) or XOR(c[i]) Anagrams, order doesn't matter Position-independent
Multiple Hashes Use 2+ hash functions with different bases/mods Reduce collision probability More reliable
Base Selection Prime > alphabet size (31, 37, 101, 256) Distribution quality Critical choice

Example: Double hash to reduce collisions

class DoubleRollingHash {
    constructor(s, L) {
        this.s = s;
        this.L = L;
        
        // Two different hash functions
        this.base1 = 31;
        this.mod1 = 1e9 + 7;
        this.base2 = 37;
        this.mod2 = 1e9 + 9;
        
        // Precompute base powers
        this.basePower1 = 1;
        this.basePower2 = 1;
        for (let i = 0; i < L - 1; i++) {
            this.basePower1 = (this.basePower1 * this.base1) % this.mod1;
            this.basePower2 = (this.basePower2 * this.base2) % this.mod2;
        }
    }
    
    findDuplicates() {
        const seen = new Map();
        
        // Calculate initial hashes
        let hash1 = 0, hash2 = 0;
        for (let i = 0; i < this.L; i++) {
            const code = this.s.charCodeAt(i);
            hash1 = (hash1 * this.base1 + code) % this.mod1;
            hash2 = (hash2 * this.base2 + code) % this.mod2;
        }
        
        const key = `${hash1},${hash2}`;
        seen.set(key, 0);
        
        // Rolling hash
        for (let i = this.L; i < this.s.length; i++) {
            const oldCode = this.s.charCodeAt(i - this.L);
            const newCode = this.s.charCodeAt(i);
            
            // Update hash1
            hash1 = (hash1 - oldCode * this.basePower1) % this.mod1;
            hash1 = (hash1 * this.base1 + newCode) % this.mod1;
            if (hash1 < 0) hash1 += this.mod1;
            
            // Update hash2
            hash2 = (hash2 - oldCode * this.basePower2) % this.mod2;
            hash2 = (hash2 * this.base2 + newCode) % this.mod2;
            if (hash2 < 0) hash2 += this.mod2;
            
            const key = `${hash1},${hash2}`;
            if (seen.has(key)) {
                // Extremely low probability of false positive
                return this.s.substring(i - this.L + 1, i + 1);
            }
            seen.set(key, i - this.L + 1);
        }
        
        return null;
    }
}

// Probability of collision with double hash:
// P(collision) ≈ 1 / (mod1 * mod2) ≈ 10^-18
// Practically zero for reasonable input sizes

Example: Polynomial hash implementation details

// Standard polynomial rolling hash
class PolynomialHash {
    constructor(base = 31, mod = 1e9 + 7) {
        this.base = base;
        this.mod = mod;
    }
    
    // Compute hash of entire string
    computeHash(s) {
        let hash = 0;
        for (let i = 0; i < s.length; i++) {
            hash = (hash * this.base + s.charCodeAt(i)) % this.mod;
        }
        return hash;
    }
    
    // Compute hash with precomputed powers (more efficient)
    computeHashPowers(s, powers) {
        let hash = 0;
        const n = s.length;
        for (let i = 0; i < n; i++) {
            hash = (hash + s.charCodeAt(i) * powers[n - 1 - i]) % this.mod;
        }
        return hash;
    }
    
    // Precompute powers of base
    precomputePowers(maxLen) {
        const powers = [1];
        for (let i = 1; i < maxLen; i++) {
            powers[i] = (powers[i - 1] * this.base) % this.mod;
        }
        return powers;
    }
    
    // Rolling update: remove first, add last
    roll(hash, oldChar, newChar, basePower) {
        hash = (hash - oldChar * basePower) % this.mod;
        hash = (hash * this.base + newChar) % this.mod;
        if (hash < 0) hash += this.mod;
        return hash;
    }
}

// Usage:
const hasher = new PolynomialHash();
const powers = hasher.precomputePowers(1000);
const hash1 = hasher.computeHashPowers("hello", powers);
const hash2 = hasher.computeHashPowers("world", powers);

// Different strings -> different hashes (with high probability)
Warning: Hash collisions can occur! Always verify actual string equality when hashes match, especially in Rabin-Karp. Use double hashing (two different hash functions) to reduce collision probability to near-zero for most practical applications. Never rely on hash equality alone for correctness.

Summary: Rolling Hash Patterns

  • Core Formula: hash = Σ(char[i] × base^(n-1-i)) mod m - converts string to number for O(1) comparison
  • Rolling Update: new_hash = (old_hash - first_char × base^(L-1)) × base + new_char - O(1) window slide
  • Rabin-Karp: Pattern matching with rolling hash - O(n+m) average, O(nm) worst case with collisions
  • Base Selection: Use prime > alphabet size (31 for lowercase, 256 for ASCII) - reduces collisions
  • Modulo Selection: Large prime like 10^9+7 or 10^9+9 - prevents integer overflow while maintaining distribution
  • Collision Handling: When hashes match, verify actual strings - hash equality doesn't guarantee string equality
  • DNA Sequences: Only 4 chars (A,C,G,T) - can use 2-bit encoding instead of polynomial hash for efficiency
  • Bit Encoding: bitmask = (bitmask << 2 | newChar) & mask - simpler than polynomial, no modulo needed
  • Longest Duplicate: Binary search on length + rolling hash to check - O(n log n) time
  • Binary Search Insight: If duplicate of length k exists, duplicate of k-1 also exists - monotonic property enables binary search
  • Anagrams: Need order-independent hash - use frequency array or commutative hash (sum/XOR), not polynomial
  • Double Hashing: Use two hash functions with different base/mod - collision probability ≈ 1/(mod1×mod2) ≈ 10^-18
  • Precompute Powers: Calculate base^i for all i - avoids repeated exponentiation, O(1) hash computation
  • Applications: String matching, duplicate detection, substring search, plagiarism detection, DNA analysis