Friday, November 15, 2013

Knuth Morris Pratt (KMP) Algorithm

Objective:
Given a (smaller) pattern string P, find all its occurrences in a (larger) text string T.

Learning Resources:
http://en.wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm
http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=stringSearching 

Good Intuition and worked examples:
http://jakeboxer.com/blog/2009/12/13/the-knuth-morris-pratt-algorithm-in-my-own-words/

Short Explanation:
1) Use the pattern string in order to build the Failure Function array. 
2) Use the failure function to skip certain characters in the text and print each occurrence of the pattern in the text.

Complexity: O(length_of_pattern + length_of_text)

Implementation:
// P - pattern, F - failure function, m - P.size()
void FailureFunction(string &P, vector<int> &F, int m) {
    int i = 1, j = 0;
    F[0] = 0;
    while(i < m) {
        if(P[i] == P[j])
            {F[i] = j+1; i++; j++;}
        else if(j > 0)
            j = F[j-1];
        else
            {F[i] = 0; i++;}
    }
}

// P - pattern, T - text
void KMP(string &P, string &T) {
    int m=P.size(), n=T.size(), i=0, j=0;
    vector<int> F(m+1);
    FailureFunction(P,F,m);
    while(i < n) {
        while(j && T[i] != P[j]) j = F[j-1]; 
        if(T[i] == P[j]) j++;
        if(j == m) printf("%d\n", i-j+1);
        i++;
    }
}

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. If someone is still having trouble understanding "Why this algorithm works?", I would suggest them to read CLRS chapter on string matching. It covers all the proofs. It helped me in understanding KMP completely.

    ReplyDelete