Lab 8 Solution

Starting from:

~~$35~~

$29

Home

Lab 8 Solution

learning goals

In this lab you will measure the performance of algorithms for sorting lists.

set-up

Download sort.py, test sort.py, and chart.xls, saving them to a sub-directory called lab8. Notice that test sort.py uses module timeit to measure total time used, and cProle to nd out which parts of your

code are costing the most time.

Note: If you're using Wing, there is an (in)compatibility issue that means you must run the tests in debug mode.

big-Oh characteristics of sorting algorithms

In this section you will test various sorting algorithms to see, empirically, which complexity class they fall into. The key idea is to notice how quickly the running time grows as you increase the size of the problem | the number of elements in a list in this case.

Open test sort.py in your IDE, and press the run button. Various results should print in the console. Warning: We have tried to choose the size of the lists being sorted so that the experiment takes a

reasonable amount of time. You are welcome to change these parameters, but notice that the running times may increase to take longer than you are comfortable with.

As you work through the various sorting algorithms, record your results on chart.xls. To graph your results, select all the rows for, say, the randomized sort. Then choose the chart tool from the toolbar, use

the wizard to select points and lines as the chart type, and under Data Range select Data Series in rows, and
rst row and column as labels.

Suppose n denotes the length of a list being sorted.

ing algorithms take time proportional to to run (O( (O(n lg n)).

When you're done graphing, decide which sort-)), and which take time proportional to n lg n

less than big-Oh

The scaling behaviour of an algorithm | how it responds to the problem size being increased | is expressed

in its big-Oh class. An algorithm that grows quickly as the problem size is increased may not be feasible,
no matter how much tweaking you subject the code to. Developers think rst about big-Oh, long before

optimizing.

Sometimes there is some time to be saved by tweaking di
erent implementations of the same algorithm,

or di
erent algorithms in the same big-Oh class. This sort of tweaking may be appropriate at late stages of development, when the code is not being changed a lot. Or perhaps not.

selection sort

Selection sort works by repeatedly selecting the smallest remaining item and putting it where it belongs. When you pro
le selection sort, you'll discover there are many calls to len. Change the while loop to a

for loop to avoid re-evaluating len, re-run the proling. You'll notice that len is still being called a lot; nd out where and use the same trick to avoid calling len so much.

How much time did you just save? Was it worth it?

insertion sort

Insertion sort works by repeatedly inserting the next item where it belongs among the sorted items at the front of the list. There are two versions: one manually moves items using a loop, and the other relies on Python's del. Why do you think Python's list code is so much faster? Of selection sort and insertion sort,

which is faster? Why do you think this is?

bubblesort

Bubblesort works by repeatedly scanning the entire list and swapping adjacent items that are out of order. One consequence of bubblesort is that, on the rst scan, the largest item must end up at the end of the list no matter where that item was before the rst scan. Given what we've learned from timing selection and insertion sort, how do you think bubblesort will perform?

There are two versions of bubblesort. The second one has a check to see whether any items have been swapped on the last scan and, if not, stops early (in that case, no items were out of order). How much of a di
erence does it make to exit early? Is it noticeable? Once you've done the bubblesort timing, gure out which version is faster and why.

mergesort

Mergesort is di
erent: it splits the list in half, sorts the two halves, and then merges the two sorted halves.
There are two versions: the rst one uses a helper function mergesort 1 that returns a new sorted list (and

thus only replaces the items in the original list once, when the helper function exits), and the second one uses a helper function mergesort 2 that sorts the list between two indices and continually updates the original

list. Which do you think is faster, and why?

quicksort

Quicksort works by partitioning the list into two halves: those items less than the rst item, and those greater than or equal to the rst item. For example, if the list is [5, 1, 7, 3, 9, 12], then the helper function partition will rearrange the list into this: [3, 1, 5, 9, 12, 7] | notice that the 5 is now in the right place. Then

the left and right sections are sorted using quicksort. How fast is this? Is quicksort faster on nearly-sorted lists or on random data? Why?

There are two versions of quicksort. The second one uses indices to sort the list in-place, without making copies of each sub-list. How much di
erence does this make?

list.sort()

Compare Python's built-in sort to the other sorting algorithms. Why do you think the Python sort is so much faster? You may want to google tim sort.

2