Hash Tables
Terms
Hash Table
-
A data structure that uses a random access data structure, such as an array,
and a mapping function, called a hash function, to allow for O(1) searches.
Array
-
A set of items which are randomly accessible by numeric index; a very common
data structure in computer science. See the arrays SparkNote.
Binary Search
-
A technique for searching an ordered list in which we first check the middle
item and -- based on that comparison -- "discard" half the data. The same
procedure is then applied to the remaining half until a match is found or
there are no more items left.
Efficiency
-
The efficiency of an algorithm is the amount of resources it uses to find
an answer. It is usually measured in terms of the abstract computations,
such as comparisons or data moves, the memory used, the number of messages
passed, the number of disk accesses, etc.
Sequential Search
-
An algorithm for searching an array or list by checking items one at a time.
Data Structure
-
An organization of information, usually in memory, for better algorithm
efficiency, such as linked lists and arrays.
Hash Function
-
Hashing is the process of running data through a hash function.
A hash function is a mapping between a set of input values and a set of
integers, known as hash values. A good hash function has the following
properties:
1) The hash value is fully determined by the data being hashed.
2) The hash function uses all the input data.
3) The hash function "uniformly" distributes the data across the entire set
of possible hash values.
4) The hash function generates very different hash values for similar strings.
An example of a bad hash value that doesn't satisfy any of the above rules
would be:
int hash(char *data, int table_size)
{
return 220 % table_size;
}
An example of a relatively good hash function would be:
int hash(char *data, int table_size)
{
int h, i;
len = strlen(data);
h=0;
for (i=0; i<len; ++i) {
h += data[i];
h += (h<<10);
h ^= (h>>6);
}
h += (h<<3);
h ^= (h>>11);
h += (h<<15);
return h % table_size;
}
Collision
-
A collision occurs when two data elements are hashed to the same value and
try to occupy the same space in the hash table (in other words they collide).
This is often solved by a linear or quadratic probing method or by separate
chaining.
Linear Probing
-
Linear probing is one method for dealing with collisions. If a data
element hashes to a location in the table that is already occupied, the
table is searched consecutively from that location until an open location
is found.
Separate Chaining
-
Separate chaining is a method for dealing with collisions. The hash table
is an array of linked lists. Data elements that hash to the same value
are stored in a linked list originating from the index equivalent of their
hash value.
Linked List
-
A list data structure made up of nodes, each of which holds a pointer
to the next list element.





