Sunday, 15 November 2015

Sorting algorithms



Sorting algorithm

A sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order. Efficient sorting is important for optimizing the use of other algorithms (such as search and merge algorithms) which require input data to be in sorted lists; it is also often useful for canonicalizing data and for producing human-readable output. More formally, the output must satisfy two conditions:
  1. The output is in nondecreasing order (each element is no smaller than the previous element according to the desired total order);
  2. The output is a permutation (reordering) of the input.
Further, the data is often taken to be in an array, which allows random access, rather than a list, which only allows sequential access, though often algorithms can be applied with suitable modification to either type of data.
Since the dawn of computing, the sorting problem has attracted a great deal of research, perhaps due to the complexity of solving it efficiently despite its simple, familiar statement. For example, bubble sort was analyzed as early as 1956.[1] A fundamental limit of comparison sorting algorithms is that they require linearithmic time – O(n log n) – in the worst case, though better performance is possible on real-world data (such as almost-sorted data), and algorithms not based on comparison, such as counting sort, can have better performance. Although many[who?] consider sorting a solved problem – asymptotically optimal algorithms have been known since the mid-20th century – useful new algorithms are still being invented, with the now widely used Timsort dating to 2002, and the library sort being first published in 2006.
Sorting algorithms are prevalent in introductory computer science classes, where the abundance of algorithms for the problem provides a gentle introduction to a variety of core algorithm concepts, such as big O notation, divide and conquer algorithms, data structures such as heaps and binary trees, randomized algorithms, best, worst and average case analysis, time-space tradeoffs, and upper and lower bounds.

Classification

Sorting algorithms are often classified by:
  • Computational complexity (worst, average and best behavior) of element comparisons in terms of the size of the list (n). For typical serial sorting algorithms good behavior is O(n log n), with parallel sort in O(log2 n), and bad behavior is O(n2). (See Big O notation.) Ideal behavior for a serial sort is O(n), but this is not possible in the average case. Optimal parallel sorting is O(log n). Comparison-based sorting algorithms, which evaluate the elements of the list via an abstract key comparison operation, need at least O(n log n) comparisons for most inputs.
  • Computational complexity of swaps (for "in place" algorithms).
  • Memory usage (and use of other computer resources). In particular, some sorting algorithms are "in place". Strictly, an in place sort needs only O(1) memory beyond the items being sorted; sometimes O(log(n)) additional memory is considered "in place".
  • Recursion. Some algorithms are either recursive or non-recursive, while others may be both (e.g., merge sort).
  • Stability: stable sorting algorithms maintain the relative order of records with equal keys (i.e., values).
  • Whether or not they are a comparison sort. A comparison sort examines the data only by comparing two elements with a comparison operator.
  • General method: insertion, exchange, selection, merging, etc. Exchange sorts include bubble sort and quicksort. Selection sorts include shaker sort and heapsort. Also whether the algorithm is serial or parallel. The remainder of this discussion almost exclusively concentrates upon serial algorithms and assumes serial operation.
  • Adaptability: Whether or not the presortedness of the input affects the running time. Algorithms that take this into account are known to be adaptive.

Stability

An example of stable sorting on playing cards. When the cards are sorted by rank with a stable sort, the two 5s must remain in the same order in the sorted output that they were originally in. When they are sorted with a non-stable sort, the 5s may end up in the opposite order in the sorted output.
When sorting some kinds of data, only part of the data is examined when determining the sort order. For example, in the card sorting example to the right, the cards are being sorted by their rank, and their suit is being ignored. This allows the possibility of multiple different correctly sorted versions of the original list. Stable sorting algorithms choose one of these, according to the following rule: if two items compare as equal, like the two 5 cards, then their relative order will be preserved, so that if one came before the other in the input, it will also come before the other in the output.
More formally, the data being sorted can be represented as a record or tuple of values, and the part of the data that is used for sorting is called the key. In the card example, cards are represented as a record (rank, suit), and the key is the rank. A sorting algorithm is stable if whenever there are two records R and S with the same key, and R appears before S in the original list, then R will always appear before S in the sorted list.
When equal elements are indistinguishable, such as with integers, or more generally, any data where the entire element is the key, stability is not an issue. Stability is also not an issue if all keys are different.
Unstable sorting algorithms can be specially implemented to be stable. One way of doing this is to artificially extend the key comparison, so that comparisons between two objects with otherwise equal keys are decided using the order of the entries in the original input list as a tie-breaker. Remembering this order, however, may require additional time and space.

Comparison of algorithms


Description: (\log n)^2In this table, n is the number of records to be sorted. The columns "Average" and "Worst" give the time complexity in each case, under the assumption that the length of each key is constant, and that therefore all comparisons, swaps, and other needed operations can proceed in constant time. "Memory" denotes the amount of auxiliary storage needed beyond that used by the list itself, under the same assumption. The run times and the memory requirements listed below should be understood to be inside big O notation. Logarithms are of any base; the notation means .
These are all comparison sorts, and so cannot perform better than O(n log n) in the average or worst case.
Name
Best
Average
Worst
Memory
Stable
Method
Other notes
Description: n \log n
Description: n \log n
Description: n^2
Description: \log non average, worst case is Description: n; Sedgewick variation is Description: \log nworst case
typical in-place sort is not stable; stable versions exist
Partitioning
Quicksort is usually done in place with O(log n) stack space.[citation needed] Most implementations are unstable, as stable in-place partitioning is more complex. Naïve variants use an O(n) space array to store the partition.[citation needed] Quicksort variant using three-way (fat) partitioning takes O(n) comparisons when sorting an array of equal keys.
Description: n \log n
Description: n \log n
Description: n \log n
Description: n
Yes
Merging
Highly parallelizable (up to O(log n) using the Three Hungarians' Algorithm[2] or, more practically, Cole's parallel merge sort) for processing large amounts of data.
Description: n \log^2 n
Description: 1
Yes
Merging
Can be implemented as a stable sort based on stable in-place merging.[3]
Description: n \log n
Description: n \log n
Description: n \log n
Description: 1
No
Selection

Description: n
Description: n^2
Description: n^2
Description: 1
Yes
Insertion
O(n + d), in the worst case over sequences that have d inversions.
Description: n \log n
Description: n \log n
Description: n \log n
Description: \log n
No
Partitioning & Selection
Used in several STL implementations.
Description: n^2
Description: n^2
Description: n^2
Description: 1
No
Selection
Stable with O(n) extra space, for example using lists.[4]
Description: n
Description: n \log n
Description: n \log n
Description: n
Yes
Insertion & Merging
Makes n comparisons when the data is already sorted or reverse sorted.
Description: n
Description: n \log n
Description: n \log n
Description: n
Yes
Insertion
Makes n comparisons when the data is already sorted or reverse sorted.
Description: n
Description: n \log^2 n
or
Description: n^{3/2}
Depends on gap sequence;
best known is Description: n \log^2 n
Description: 1
No
Insertion
Small code size, no use of call stack, reasonably fast, useful where memory is at a premium such as embedded and older mainframe applications.
Description: n
Description: n^2
Description: n^2
Description: 1
Yes
Exchanging
Tiny code size.
Description: n
Description: n \log n
Description: n \log n~\text{(balanced)}
Description: n
Yes
Insertion
Description: n^2
Description: n^2
Description: n^2
Description: 1
No
Insertion
In-place with theoretically optimal number of writes.
Description: n
Description: n \log n
Description: n^2
Description: n
Yes
Insertion

Description: n
Description: n \log n
Description: n
No
Insertion & Selection
Finds all the longest increasing subsequences in O(n log n).
Description: n
Description: n \log n
Description: n \log n
Description: 1
No
Selection
An adaptive sort: Description: ncomparisons when the data is already sorted, and 0 swaps.
Description: n
Description: n^2
Description: n^2
Description: n
Yes
Selection

Description: n \log n
Description: n \log n
Description: n \log n
No
Selection
Variation of Heap Sort.
Description: n
Description: n^2
Description: n^2
Description: 1
Yes
Exchanging

Description: n
Description: n \log n
Description: n^2
Description: 1
No
Exchanging
Small code size.
Description: n
Description: n^2
Description: n^2
Description: 1
Yes
Exchanging
Tiny code size.
UnShuffle Sort[6]
Description: kN
Description: kN
Description: kN
In place for linked lists. N*sizeof(link) for array.
No
Distribution and Merge
No exchanges are performed. The parameter 'k' is proportional to the entropy in the input. K = 1 for ordered or ordered by reversed input.
Franceschini's method[7]
Description: n \log n
Description: n \log n
Description: 1
Yes
?

Description: n
Description: n \log n
Description: n \log n
Description: 1
Yes
Insertion & Merging
Combine a block-based O(n) in-place merge algorithm[8] with a bottom-up merge sort.
Description: n
Description: n^2
Description: n^2
Description: 1
Yes
Exchanging
Can be run on parallel processors easily.
The following table describes integer sorting algorithms and other sorting algorithms that are not comparison sorts. As such, they are not limited by a Description: \Omega(n \log n)lower bound. Complexities below assume n items to be sorted, with keys of size k, digit size d, and r the range of numbers to be sorted. Many of them are based on the assumption that the key size is large enough that all entries have unique key values, and hence that n << 2k, where << means "much less than."
Non-comparison sorts
Name
Best
Average
Worst
Memory
Stable
n << 2k
Notes
Description: n + 2^k
Description: n + 2^k
Description: 2^k
Yes
Yes

Bucket sort (uniform keys)
Description: n+k
Description: n^2 \cdot k
Description: n \cdot k
Yes
No
Assumes uniform distribution of elements from the domain in the array.[9]
Bucket sort (integer keys)
Description: n+r
Description: n+r
Description: n+r
Yes
Yes
If r is O(n), then Average is O(n).[10]
Description: n+r
Description: n+r
Description: n+r
Yes
Yes
If r is O(n), then Average is O(n).[9]
Description: n \cdot \frac{k}{d}
Description: n \cdot \frac{k}{d}
Description: n + 2^d
Yes
No
Description: n \cdot \frac{k}{d}
Description: n \cdot \frac{k}{d}
Description: n + 2^d
Yes
No
Stable version uses an external array of size n to hold all of the bins.
MSD Radix Sort (in-place)
Description: n \cdot \frac{k}{d}
Description: n \cdot \frac{k}{d}
Description: 2^d
No
No
Description: \frac{k}{d}recursion levels, 2d for count array.
Description: n \cdot \frac{k}{d}
Description: n \cdot \left( {\frac{k}{s} + d} \right)
Description: \frac{k}{d} \cdot 2^d
No
No
Asymptotic are based on the assumption that n << 2k, but the algorithm does not require this.
Description: n \cdot \frac{k}{d}
Description: n \cdot \frac{k}{d}
Description: n \cdot \frac{k}{d}
No
No
Has better constant factor than radix sort for sorting strings. Though relies somewhat on specifics of commonly encountered strings.
Description: n
Description: n+r
Description: n^2
Description: n
Can be made stable with additional Description: O(n)space used.
No
Requires uniform distribution of elements from the domain in the array to run in linear time. If distribution is extremely skewed then it can go quadratic if underlying sort is quadratic (it is usually an insertion sort). In-place version is not stable.
Description: n \cdot \frac{k}{d}
Description: n \cdot \frac{k}{d}
Description: n+2^d
No
A variation of bucket sort, which works very similar to MSD Radix Sort. Specific to post service needs.
Samplesort can be used to parallelize any of the non-comparison sorts, by efficiently distributing data into several buckets and then passing down sorting to several processors, with no need to merge as buckets are already sorted between each other.

No comments:

Post a Comment