Sorting
algorithm
A sorting algorithm is an algorithm
that puts elements of a list in a certain order.
The most-used orders are numerical order and lexicographical order. Efficient sorting is
important for optimizing the use of other algorithms (such as search
and merge algorithms) which require input data to be in
sorted lists; it is also often useful for canonicalizing
data and for producing human-readable output. More formally, the output must
satisfy two conditions:
- The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order);
- The output is a permutation (reordering) of the input.
Further, the data is often taken to be in an array,
which allows random access, rather than a list, which only allows
sequential access, though often algorithms can be applied with suitable
modification to either type of data.
Since the dawn of computing, the sorting problem has attracted a great
deal of research, perhaps due to the complexity of solving it efficiently
despite its simple, familiar statement. For example, bubble
sort was analyzed as early as 1956.[1]
A fundamental limit of comparison sorting algorithms is that they require linearithmic
time – O(n log n) – in the worst case, though better performance
is possible on real-world data (such as almost-sorted data), and algorithms not
based on comparison, such as counting
sort, can have better performance. Although many[who?] consider
sorting a solved problem – asymptotically optimal algorithms have been known
since the mid-20th century – useful new algorithms are still being invented,
with the now widely used Timsort dating to 2002, and the library
sort being first published in 2006.
Sorting algorithms are prevalent in introductory computer science classes,
where the abundance of algorithms for the problem provides a gentle
introduction to a variety of core algorithm concepts, such as big
O notation, divide and conquer algorithms, data
structures such as heaps and binary trees, randomized algorithms, best, worst and average case analysis,
time-space tradeoffs, and upper and lower bounds.
Classification
Sorting algorithms are often classified by:- Computational complexity (worst, average and best behavior) of element comparisons in terms of the size of the list (n). For typical serial sorting algorithms good behavior is O(n log n), with parallel sort in O(log2 n), and bad behavior is O(n2). (See Big O notation.) Ideal behavior for a serial sort is O(n), but this is not possible in the average case. Optimal parallel sorting is O(log n). Comparison-based sorting algorithms, which evaluate the elements of the list via an abstract key comparison operation, need at least O(n log n) comparisons for most inputs.
- Computational complexity of swaps (for "in place" algorithms).
- Memory usage (and use of other computer resources). In particular, some sorting algorithms are "in place". Strictly, an in place sort needs only O(1) memory beyond the items being sorted; sometimes O(log(n)) additional memory is considered "in place".
- Recursion. Some algorithms are either recursive or non-recursive, while others may be both (e.g., merge sort).
- Stability: stable sorting algorithms maintain the relative order of records with equal keys (i.e., values).
- Whether or not they are a comparison sort. A comparison sort examines the data only by comparing two elements with a comparison operator.
- General method: insertion, exchange, selection, merging, etc. Exchange sorts include bubble sort and quicksort. Selection sorts include shaker sort and heapsort. Also whether the algorithm is serial or parallel. The remainder of this discussion almost exclusively concentrates upon serial algorithms and assumes serial operation.
- Adaptability: Whether or not the presortedness of the input affects the running time. Algorithms that take this into account are known to be adaptive.
Stability
An example of stable sorting on playing cards. When the
cards are sorted by rank with a stable sort, the two 5s must remain in the same
order in the sorted output that they were originally in. When they are sorted
with a non-stable sort, the 5s may end up in the opposite order in the sorted
output.
When sorting some kinds of data, only part of the data is examined when
determining the sort order. For example, in the card sorting example to the
right, the cards are being sorted by their rank, and their suit is being
ignored. This allows the possibility of multiple different correctly sorted
versions of the original list. Stable sorting algorithms choose one of these,
according to the following rule: if two items compare as equal, like the two 5
cards, then their relative order will be preserved, so that if one came before
the other in the input, it will also come before the other in the output.More formally, the data being sorted can be represented as a record or tuple of values, and the part of the data that is used for sorting is called the key. In the card example, cards are represented as a record (rank, suit), and the key is the rank. A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.
When equal elements are indistinguishable, such as with integers, or more generally, any data where the entire element is the key, stability is not an issue. Stability is also not an issue if all keys are different.
Unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison, so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original input list as a tie-breaker. Remembering this order, however, may require additional time and space.
Comparison of algorithms
In this table, n is the number of records to be sorted. The columns "Average" and "Worst" give the time complexity in each case, under the assumption that the length of each key is constant, and that therefore all comparisons, swaps, and other needed operations can proceed in constant time. "Memory" denotes the amount of auxiliary storage needed beyond that used by the list itself, under the same assumption. The run times and the memory requirements listed below should be understood to be inside big O notation. Logarithms are of any base; the notation means .
These are all comparison sorts, and so cannot perform better than O(n log n) in the average or worst case.
Name
|
Best
|
Average
|
Worst
|
Memory
|
Stable
|
Method
|
Other notes
|
on average, worst case is ; Sedgewick variation is worst case
|
typical in-place
sort is not stable; stable versions exist
|
Partitioning
|
Quicksort is usually done in place with O(log n) stack space.[citation
needed] Most implementations are unstable, as stable
in-place partitioning is more complex. Naïve variants use an O(n) space array to
store the partition.[citation
needed] Quicksort variant using three-way (fat)
partitioning takes O(n) comparisons when sorting an array of equal keys.
|
||||
Yes
|
Merging
|
Highly parallelizable (up to O(log n)
using the Three Hungarians' Algorithm[2]
or, more practically, Cole's parallel merge sort) for processing large
amounts of data.
|
|||||
—
|
—
|
Yes
|
Merging
|
Can be implemented as a stable sort based on stable
in-place merging.[3]
|
|||
No
|
Selection
|
||||||
Yes
|
Insertion
|
O(n + d), in the
worst case over sequences that have d inversions.
|
|||||
No
|
Partitioning &
Selection
|
Used in several STL implementations.
|
|||||
No
|
Selection
|
Stable with O(n) extra space, for example using lists.[4]
|
|||||
Yes
|
Insertion &
Merging
|
Makes n comparisons when the data is already sorted
or reverse sorted.
|
|||||
Yes
|
Insertion
|
Makes n comparisons when the data is already sorted
or reverse sorted.
|
|||||
or |
Depends on gap sequence;
best known is |
No
|
Insertion
|
Small code size, no use of call stack, reasonably fast,
useful where memory is at a premium such as embedded and older mainframe
applications.
|
|||
Yes
|
Exchanging
|
Tiny code size.
|
|||||
Yes
|
Insertion
|
When using a self-balancing binary search tree.
|
|||||
No
|
Insertion
|
In-place with theoretically optimal number of writes.
|
|||||
Yes
|
Insertion
|
||||||
—
|
No
|
Insertion &
Selection
|
Finds all the longest increasing subsequences in
O(n log n).
|
||||
No
|
Selection
|
An adaptive sort: comparisons
when the data is already sorted, and 0 swaps.
|
|||||
Yes
|
Selection
|
||||||
No
|
Selection
|
Variation of Heap Sort.
|
|||||
Yes
|
Exchanging
|
||||||
No
|
Exchanging
|
Small code size.
|
|||||
Yes
|
Exchanging
|
Tiny code size.
|
|||||
UnShuffle Sort[6]
|
In place for linked lists. N*sizeof(link) for array.
|
No
|
Distribution and
Merge
|
No exchanges are performed. The parameter 'k' is
proportional to the entropy in the input. K = 1 for ordered or ordered by
reversed input.
|
|||
Franceschini's
method[7]
|
—
|
Yes
|
?
|
||||
Yes
|
Insertion &
Merging
|
Combine a block-based O(n) in-place merge algorithm[8]
with a bottom-up merge sort.
|
|||||
Yes
|
Exchanging
|
Can be run on parallel processors easily.
|
Non-comparison
sorts
|
|||||||
Name
|
Best
|
Average
|
Worst
|
Memory
|
Stable
|
n << 2k
|
Notes
|
—
|
Yes
|
Yes
|
|||||
Bucket
sort (uniform keys)
|
—
|
Yes
|
No
|
Assumes uniform distribution of elements from the domain
in the array.[9]
|
|||
Bucket
sort (integer keys)
|
—
|
Yes
|
Yes
|
If r is O(n), then Average is O(n).[10]
|
|||
—
|
Yes
|
Yes
|
If r is O(n), then Average is O(n).[9]
|
||||
—
|
Yes
|
No
|
|||||
—
|
Yes
|
No
|
Stable version uses an external array of size n to hold
all of the bins.
|
||||
MSD Radix Sort (in-place)
|
—
|
No
|
No
|
recursion
levels, 2d for count array.
|
|||
—
|
No
|
No
|
Asymptotic are based on the assumption that n << 2k, but the algorithm
does not require this.
|
||||
—
|
No
|
No
|
Has better constant factor than radix sort for sorting
strings. Though relies somewhat on specifics of commonly encountered strings.
|
||||
Can be made stable
with additional space
used.
|
No
|
Requires uniform distribution of elements from the domain
in the array to run in linear time. If distribution is extremely skewed then
it can go quadratic if underlying sort is quadratic (it is usually an
insertion sort). In-place version is not stable.
|
|||||
—
|
—
|
No
|
A variation of bucket sort, which works very similar to
MSD Radix Sort. Specific to post service needs.
|
Theory
|
|
Exchange sorts
|
|
|
|
Insertion sorts
|
|
Merge sorts
|
|
Distribution sorts
|
|
Concurrent Sort
|
Batcher odd–even mergesort
|
|
|
Other
|
No comments:
Post a Comment