Search Menu


Another use of hashing: Rabin-Karp string searching

page 1 of 3

A problem we haven't looked at much, and will only touch on briefly in this guide, is string searching, the problem of finding a string within another string. For example, when you execute the "Find" command in your word processor, your program starts at the beginning of the string holding all the text (let's assume for the moment that this is how your word processor stores your text, which it probably doesn't) and searches within that text for another string you've specified.

The most basic string searching method is called the "brute-force" method. The brute force method is simply a search through all the possible solutions to the problem. Each possible solution is tested until one that works is found.

Brute-force String Searching

We'll call the string being searched "text string" and the string being searched for "pattern string". The algorithm for Brute-force search works as follows: 1. Start at the beginning of the text string. 2. Compare the first n characters of the text string (where n is the length of the pattern string) to the pattern string. Do they match? If yes, we're done. If no, continue. 3. Shift over one place in the text string. Do the first n characters match? If yes, we're done. If no, repeat this step until we either reach the end of the text string without finding a match, or until we find a match.

The code for it would look something like this:

int bfsearch(char* pattern, char* text)
    int pattern_len, num_iterations, i;

    /* If one of the strings is NULL, then return that the string was
     * not found.
    if (pattern == NULL || text == NULL) return -1; 

    /* Get the length of the string and determine how many different places
     * we can put the pattern string on the text string to compare them.
    pattern_len = strlen(pattern);
    num_iterations = strlen(text) - pattern_len + 1; 
    /* For every place, do a string comparison.  If the string is found,
     * return the place in the text string where it resides.
    for (i = 0; i < num_iterations; i++) {
        if (!strncmp(pattern, &(text[i]), pattern_len)) return i;
    /* Otherwise, indicate that the pattern wasn't found */
    return -1;

This works, but as we've seen previously just working isn't enough. What is the efficiency of brute-force search? Well, each time we compare the strings, we do M comparisons, where M is the length of the pattern string. And how many times do we do this? N times, where N is the length of the text string. So brute-force string search is O(MN) . Not so good.

How can we do better?