Skip over navigation

Hash Tables

Problems

Another use of hashing: Rabin-Karp string searching

How to Cite This SparkNote

Problem : Give the best, average, and worst case efficiencies of both the brute-force string search and Rabin-Karp string search.

M = length of pattern N = length of text Brute-force Best = O(M) Average = O(MN) Worst = O(MN) Rabin-Karp Best = O(M) Average = O(M + N) Worst = O(MN)

Problem : How does Rabin-Karp achieve an efficiency of O(M + N) ?

The intial hash is O(N) . After that, each update is O(1) , and there are O(N) updates. Sometimes we need to do O(M) work when the hash values match, but we can ignore as a good hash function will not cause many collisions. So O(M) + O(1)*O(N) = O(M) + O(N) = O(M + N) .

Problem : Using the hash() and hash_update() functions given in this section, give an example of a pattern string and text string that will reduce Rabin-Karp back to brute-force search, decreasing its efficiency back to O(MN) .

Pattern string = "aabb" Text string = "abababababababababaabb" Because the hash function we're using only sums the letters, this pattern will match the hash at almost every location in the text string as almost every position has two a's and two b's.

Problem : Challenge problem: Create a hash_update() function to go along with this hash() function:


long hash_str(hash_table_t *hashtable, int hash_len, char *start)
{
	long hval;
	int i; 
	
	/* If the string passed in is NULL, return 0 */
	if (start == NULL) return 0; 
	
	/* Multiply the old hash value by 257 and add the current character
	 * for as long as the string
	 */
	hval = 0L;
	for(i=0; i < hash_len; i++) {
		hval =  ((257 * hval) + start[i]) % hashtable->size;
	} 

	/* Return the hash value */
	return hval;
}
Use the function prototype:

long hash_update(
	long hval, 	/* old hash value */
	char start,	/* character to be removed */
	char end,	/* character to be added */
	int hash_len,	/* length of the string */
	hash_table_t *hashtable );	/* the hash table */


long hash_update(
	long hval,
	char start,
	char end,
	int hash_len,
	hash_table_t *hashtable )
{
	/* Based on the length of the string, compute how many times the far
	 * left character (the one being removed) was multiplied by 257.
	 * NOTE: In a real implementation of this, you would want to do this as
	 * a precomputational step so that you wouldn't have to do it every time
	 * this function was called
	 */
	long mul_fac = 1;
	for(i=0; i<hash_len; i++) {
		mul_fac = (mul_fac * 257) % hashtable->size;
	}

	/* Determine the value of the oldest character after it was multiplied */
    long oldest = (mul_fac * start) % hashtable->size;

	/* Subtract it from the current hash value to remove it */
    long oldest_removed =   ((hval + hashtable->size) - oldest)
							% hashtable->size;

	/* Add in the new character as you would in the normal hash function */
    hval =  ((257 * oldest_removed) + end) % hashtable->size;

	/* Return the new hash value */
    return hval;
}

Problem : Give a hash function and a hash update function that will always reduce Rabin-Karp to O(MN) efficiency.

There are many. One example:

int hash(hash_table_t *hashtable, char *str)
{
	return 220;
}

int hash_update(hash_table_t *hashtable, char *str)
{
	return 220;
}
As every string hashes to the same number, Rabin-Karp will not save anything over Brute-force. Of course, this is a terrible hash function and you would never want to use it.

Follow Us