Skip over navigation

Hash Tables

Terms

Introduction and Summary

What is a Hash Table?

Hash Table  -  A data structure that uses a random access data structure, such as an array, and a mapping function, called a hash function, to allow for O(1) searches.
Array  -  A set of items which are randomly accessible by numeric index; a very common data structure in computer science. See the arrays SparkNote.
Binary Search  -  A technique for searching an ordered list in which we first check the middle item and -- based on that comparison -- "discard" half the data. The same procedure is then applied to the remaining half until a match is found or there are no more items left.
Efficiency  -  The efficiency of an algorithm is the amount of resources it uses to find an answer. It is usually measured in terms of the abstract computations, such as comparisons or data moves, the memory used, the number of messages passed, the number of disk accesses, etc.
Sequential Search  -  An algorithm for searching an array or list by checking items one at a time.
Data Structure  -  An organization of information, usually in memory, for better algorithm efficiency, such as linked lists and arrays.
Hash Function  -  Hashing is the process of running data through a hash function. A hash function is a mapping between a set of input values and a set of integers, known as hash values. A good hash function has the following properties: 1) The hash value is fully determined by the data being hashed. 2) The hash function uses all the input data. 3) The hash function "uniformly" distributes the data across the entire set of possible hash values. 4) The hash function generates very different hash values for similar strings. An example of a bad hash value that doesn't satisfy any of the above rules would be:

int hash(char *data, int table_size)
{
	return 220 % table_size;
}
An example of a relatively good hash function would be:

int hash(char *data, int table_size)
{
	int h, i;

	len = strlen(data);

	h=0;
	
	for (i=0; i<len; ++i) { 
		h += data[i]; 
		h += (h<<10); 
		h ^= (h>>6); 
	} 
	
	h += (h<<3); 
	h ^= (h>>11); 
	h += (h<<15); 

	return h % table_size;
}
Collision  -  A collision occurs when two data elements are hashed to the same value and try to occupy the same space in the hash table (in other words they collide). This is often solved by a linear or quadratic probing method or by separate chaining.
Linear Probing  -  Linear probing is one method for dealing with collisions. If a data element hashes to a location in the table that is already occupied, the table is searched consecutively from that location until an open location is found.
Separate Chaining  -  Separate chaining is a method for dealing with collisions. The hash table is an array of linked lists. Data elements that hash to the same value are stored in a linked list originating from the index equivalent of their hash value.
Linked List  -  A list data structure made up of nodes, each of which holds a pointer to the next list element.

Follow Us