The Merge Sort Algorithm
To understand the efficiency of the mergesort algorithm it is useful to
separate the merging from the sorting. The sorting takes place indirectly,
by repeatedly splitting the data in half until sorted singleton sets are
created. The merging then rebuilds the complete, original data set by
splicing together the sorted mini-lists. To determine the efficiency of
the sorting (breaking down) algorithm, consider how many times the data has
to be split. A data set of size 4 has to be split twice, once into two sets
of two and then again into four sets of one. A data set of size 8 has to be
split 3 times, 16 pieces of data have to be split 4 times, 32 needs 5 splits,
and so on. This sort of behavior is reflected by the logarithm:
- log2(4) = 2
- log2(8) = 3
- log2(16) = 4
- log2(32) = 5.
The breaking down of the data, then, occurs with efficiency (log n). The
merging process is linear each time two lists have to be merged, because it
is simply done by doing one comparison for each pair of elements at the top
of each sublist. For example, to merge the subarrays (2 4) and (0 1 7),
the following comparisons have to take place: 0 & 2, 1 & 2, 2 & 7, 4 & 7,
and 7 alone. 5 comparisons for 5 elements, efficiency n. Because all
log(n) sublists have to be merged, the efficiency of mergesort is
O(nlog(n)).