Skip over navigation

Hash Tables

Problems

Hash Functions

Coding up a Hash Table

Problem : What are the four requirements for a good hash function?

1) The hash value is fully determined by the data being hashed. 2) The hash function uses all the input data. 3) The hash function "uniformly" distributes the data across the entire set of p ossible hash values. 4) The hash function can generate different hash values for similar strings.

Problem : Why should a good hash function satisfy these rules?

Rule 1: If something else besides the input data is used to determine the hash, then the hash value is not as dependent upon the input data, thus allowing for a worse distribution of the hash values. Rule 2: If the hash function doesn't use all the input data, then slight variations to the input data would cause an inappropriate number of similar hash values resulting in many collisions. Rule 3: If the hash function does not uniformly distribute the data across the entire set of possible hash values, then a large number of collisions will result, cutting down on the efficiency of the hash table. Rule 4: In real world applications, many data sets contain similar data elements. We would like these data elements to still be distributable over a hash table.

Problem : Describe how the following hash function violates the four rules for a good hash function.


int hash(char *data, int table_size)
{
	return 220 % table_size;
}

Rule 1: The hash value returned is not at all determined by the data being hashed as the input is not used at all in computing the hash value. Rule 2: The hash value returned doesn't use all the input data. In fact, it doesn't use any of it. Rule 3: The hash values aren't uniformly distributed - they are always the same. Rule 4: This hash function is incapable of producing different hash values for similar strings - it always produces the same hash value.

Problem : Why do most hash functions return a non-negative integer hash value? In other words, why wouldn't a hash function return a string or a double?

Hash values are usually used in the context of hash tables to access into the hash table array. Since an array's indices are numbered starting at 0 and proceding up the integers, a hash value should be a non-negative integer.

Problem : What other applications of hash functions can you think of besides hash tables?

There are many, many applications of hash functions. One that comes up a lot in today's world applications of hash functions in cryptography.

Follow Us