Sorting: Sorting algorithms

Sorting algorithm

A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) which require input data to be in sorted lists; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions:

The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order);
The output is a permutation (reordering) of the input.

Further, the data is often taken to be in an array, which allows random access, rather than a list, which only allows sequential access, though often algorithms can be applied with suitable modification to either type of data.

Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.^[1] A fundamental limit of comparison sorting algorithms is that they require linearithmic time – O(n log n) – in the worst case, though better performance is possible on real-world data (such as almost-sorted data), and algorithms not based on comparison, such as counting sort, can have better performance. Although many^[who?] consider sorting a solved problem – asymptotically optimal algorithms have been known since the mid-20th century – useful new algorithms are still being invented, with the now widely used Timsort dating to 2002, and the library sort being first published in 2006.

Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures such as heaps and binary trees, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and upper and lower bounds.

Classification

Sorting algorithms are often classified by:

Computational complexity (worst, average and best behavior) of element comparisons in terms of the size of the list (n). For typical serial sorting algorithms good behavior is O(n log n), with parallel sort in O(log² n), and bad behavior is O(n²). (See Big O notation.) Ideal behavior for a serial sort is O(n), but this is not possible in the average case. Optimal parallel sorting is O(log n). Comparison-based sorting algorithms, which evaluate the elements of the list via an abstract key comparison operation, need at least O(n log n) comparisons for most inputs.
Computational complexity of swaps (for "in place" algorithms).
Memory usage (and use of other computer resources). In particular, some sorting algorithms are "in place". Strictly, an in place sort needs only O(1) memory beyond the items being sorted; sometimes O(log(n)) additional memory is considered "in place".
Recursion. Some algorithms are either recursive or non-recursive, while others may be both (e.g., merge sort).
Stability: stable sorting algorithms maintain the relative order of records with equal keys (i.e., values).
Whether or not they are a comparison sort. A comparison sort examines the data only by comparing two elements with a comparison operator.
General method: insertion, exchange, selection, merging, etc. Exchange sorts include bubble sort and quicksort. Selection sorts include shaker sort and heapsort. Also whether the algorithm is serial or parallel. The remainder of this discussion almost exclusively concentrates upon serial algorithms and assumes serial operation.
Adaptability: Whether or not the presortedness of the input affects the running time. Algorithms that take this into account are known to be adaptive.

Stability

An example of stable sorting on playing cards. When the cards are sorted by rank with a stable sort, the two 5s must remain in the same order in the sorted output that they were originally in. When they are sorted with a non-stable sort, the 5s may end up in the opposite order in the sorted output.

When sorting some kinds of data, only part of the data is examined when determining the sort order. For example, in the card sorting example to the right, the cards are being sorted by their rank, and their suit is being ignored. This allows the possibility of multiple different correctly sorted versions of the original list. Stable sorting algorithms choose one of these, according to the following rule: if two items compare as equal, like the two 5 cards, then their relative order will be preserved, so that if one came before the other in the input, it will also come before the other in the output.
More formally, the data being sorted can be represented as a record or tuple of values, and the part of the data that is used for sorting is called the key. In the card example, cards are represented as a record (rank, suit), and the key is the rank. A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.
When equal elements are indistinguishable, such as with integers, or more generally, any data where the entire element is the key, stability is not an issue. Stability is also not an issue if all keys are different.
Unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison, so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original input list as a tie-breaker. Remembering this order, however, may require additional time and space.

Comparison of algorithms

$Description: (\log n)^2$ In this table, n is the number of records to be sorted. The columns "Average" and "Worst" give the time complexity in each case, under the assumption that the length of each key is constant, and that therefore all comparisons, swaps, and other needed operations can proceed in constant time. "Memory" denotes the amount of auxiliary storage needed beyond that used by the list itself, under the same assumption. The run times and the memory requirements listed below should be understood to be inside big O notation. Logarithms are of any base; the notation means .
These are all comparison sorts, and so cannot perform better than O(n log n) in the average or worst case.

Comparison sorts
Name	Best	Average	Worst	Memory	Stable	Method	Other notes
Quicksort	$Description: n \log n$	$Description: n \log n$		$Description: \log n$ on average, worst case is ; Sedgewick variation is $Description: \log n$ worst case	typical in-place sort is not stable; stable versions exist	Partitioning	Quicksort is usually done in place with O(log n) stack space.^{[citation needed]} Most implementations are unstable, as stable in-place partitioning is more complex. Naïve variants use an O(n) space array to store the partition.^{[citation needed]} Quicksort variant using three-way (fat) partitioning takes O(n) comparisons when sorting an array of equal keys.
Merge sort	$Description: n \log n$	$Description: n \log n$	$Description: n \log n$		Yes	Merging	Highly parallelizable (up to O(log n) using the Three Hungarians' Algorithm^[2] or, more practically, Cole's parallel merge sort) for processing large amounts of data.
In-place merge sort	—	—	$Description: n \log^2 n$		Yes	Merging	Can be implemented as a stable sort based on stable in-place merging.^[3]
Heapsort	$Description: n \log n$	$Description: n \log n$	$Description: n \log n$		No	Selection
Insertion sort					Yes	Insertion	O(n + d), in the worst case over sequences that have d inversions.
Introsort	$Description: n \log n$	$Description: n \log n$	$Description: n \log n$	$Description: \log n$	No	Partitioning & Selection	Used in several STL implementations.
Selection sort					No	Selection	Stable with O(n) extra space, for example using lists.^[4]
Timsort		$Description: n \log n$	$Description: n \log n$		Yes	Insertion & Merging	Makes n comparisons when the data is already sorted or reverse sorted.
Cubesort		$Description: n \log n$	$Description: n \log n$		Yes	Insertion	Makes n comparisons when the data is already sorted or reverse sorted.
Shell sort		$Description: n \log^2 n$ or $Description: n^{3/2}$	Depends on gap sequence; best known is $Description: n \log^2 n$		No	Insertion	Small code size, no use of call stack, reasonably fast, useful where memory is at a premium such as embedded and older mainframe applications.
Bubble sort					Yes	Exchanging	Tiny code size.
Binary tree sort		$Description: n \log n$	$Description: n \log n~\text{(balanced)}$		Yes	Insertion	When using a self-balancing binary search tree.
Cycle sort					No	Insertion	In-place with theoretically optimal number of writes.
Library sort		$Description: n \log n$			Yes	Insertion
Patience sorting		—	$Description: n \log n$		No	Insertion & Selection	Finds all the longest increasing subsequences in O(n log n).
Smoothsort		$Description: n \log n$	$Description: n \log n$		No	Selection	An adaptive sort: comparisons when the data is already sorted, and 0 swaps.
Strand sort					Yes	Selection
Tournament sort	$Description: n \log n$	$Description: n \log n$	$Description: n \log n$	^[5]	No	Selection	Variation of Heap Sort.
Cocktail sort					Yes	Exchanging
Comb sort		$Description: n \log n$			No	Exchanging	Small code size.
Gnome sort					Yes	Exchanging	Tiny code size.
UnShuffle Sort^[6]				In place for linked lists. N*sizeof(link) for array.	No	Distribution and Merge	No exchanges are performed. The parameter 'k' is proportional to the entropy in the input. K = 1 for ordered or ordered by reversed input.
Franceschini's method^[7]	—	$Description: n \log n$	$Description: n \log n$		Yes	?
Block sort		$Description: n \log n$	$Description: n \log n$		Yes	Insertion & Merging	Combine a block-based O(n) in-place merge algorithm^[8] with a bottom-up merge sort.
Odd-even sort					Yes	Exchanging	Can be run on parallel processors easily.

The following table describes integer sorting algorithms and other sorting algorithms that are not comparison sorts. As such, they are not limited by a $Description: \Omega(n \log n)$ lower bound. Complexities below assume n items to be sorted, with keys of size k, digit size d, and r the range of numbers to be sorted. Many of them are based on the assumption that the key size is large enough that all entries have unique key values, and hence that n << 2^k, where << means "much less than."

Non-comparison sorts
Name	Best	Average	Worst	Memory	Stable	n << 2^k	Notes
Pigeonhole sort	—				Yes	Yes
Bucket sort (uniform keys)	—		$Description: n^2 \cdot k$	$Description: n \cdot k$	Yes	No	Assumes uniform distribution of elements from the domain in the array.^[9]
Bucket sort (integer keys)	—				Yes	Yes	If r is O(n), then Average is O(n).^[10]
Counting sort	—				Yes	Yes	If r is O(n), then Average is O(n).^[9]
LSD Radix Sort	—	$Description: n \cdot \frac{k}{d}$	$Description: n \cdot \frac{k}{d}$		Yes	No	^[9][10]
MSD Radix Sort	—	$Description: n \cdot \frac{k}{d}$	$Description: n \cdot \frac{k}{d}$		Yes	No	Stable version uses an external array of size n to hold all of the bins.
MSD Radix Sort (in-place)	—	$Description: n \cdot \frac{k}{d}$	$Description: n \cdot \frac{k}{d}$		No	No	$Description: \frac{k}{d}$ recursion levels, 2^d for count array.
Spreadsort	—	$Description: n \cdot \frac{k}{d}$	$Description: n \cdot \left( {\frac{k}{s} + d} \right)$	$Description: \frac{k}{d} \cdot 2^d$	No	No	Asymptotic are based on the assumption that n << 2^k, but the algorithm does not require this.
Burstsort	—	$Description: n \cdot \frac{k}{d}$	$Description: n \cdot \frac{k}{d}$	$Description: n \cdot \frac{k}{d}$	No	No	Has better constant factor than radix sort for sorting strings. Though relies somewhat on specifics of commonly encountered strings.
Flashsort					Can be made stable with additional space used.	No	Requires uniform distribution of elements from the domain in the array to run in linear time. If distribution is extremely skewed then it can go quadratic if underlying sort is quadratic (it is usually an insertion sort). In-place version is not stable.
Postman sort	—	$Description: n \cdot \frac{k}{d}$	$Description: n \cdot \frac{k}{d}$		—	No	A variation of bucket sort, which works very similar to MSD Radix Sort. Specific to post service needs.

Samplesort can be used to parallelize any of the non-comparison sorts, by efficiently distributing data into several buckets and then passing down sorting to several processors, with no need to merge as buckets are already sorted between each other.

Theory	Computational complexity theory Big O notation Total order Lists Inplacement Stability Comparison sort Adaptive sort Sorting network Integer sorting
Exchange sorts	Bubble sort Cocktail sort Odd–even sort Comb sort Gnome sort Quicksort Stooge sort Bogosort
Selection sorts	Selection sort Heapsort Smoothsort Cartesian tree sort Tournament sort Cycle sort
Insertion sorts	Insertion sort Shellsort Splaysort Tree sort Library sort Patience sorting
Merge sorts	Merge sort Cascade merge sort Oscillating merge sort Polyphase merge sort Strand sort
Distribution sorts	American flag sort Bead sort Bucket sort Burstsort Counting sort Pigeonhole sort Proxmap sort Radix sort Flashsort
Concurrent Sort	Bitonic sorter Batcher odd–even mergesort Pairwise sorting network
Hybrid sorts	Block sort Timsort Introsort Spreadsort JSort
Other	Topological sorting Pancake sorting Spaghetti sort

Sorting

Sunday, 15 November 2015

Sorting algorithms

Classification

Stability

Comparison of algorithms

No comments:

Post a Comment