$29
Build and run the code in the `prog1_mandelbrot_threads/` directory of
the code base. (Type `make` to build, and `./mandelbrot` to run it.)
This program produces the image file `mandelbrot-serial.ppm`, which is a visualization of a famous set of
complex numbers called the Mandelbrot set. (Most platforms have a .ppm
view. For example, to view the resulting images remotely, use `ssh -Y`
and the `display` command.) As you can see in the images below, the
result is a familiar and beautiful fractal. Each pixel in the image
corresponds to a value in the complex plane, and the brightness of
each pixel is proportional to the computational cost of determining
whether the value is contained in the Mandelbrot set. To get image 2,
use the command option `--view 2`. (See function `mandelbrotSerial()`
defined in `mandelbrotSerial.cpp`). You can learn more about the
definition of the Mandelbrot set at
<http://en.wikipedia.org/wiki/Mandelbrot_set.
![Mandelbrot Set](http://graphics.stanford.edu/courses/cs348v-18-winter/asst_images/asst1/mandelbrot_viz.jpg "A visualization of the Mandelbrot set. The cost of computing each pixel is proportional to its brightness. When running programs 1 and 3, you can use the command line option `--view 2` to set output to be view 2.")
Your job is to parallelize the computation of the images using
[std::thread](https://en.cppreference.com/w/cpp/thread/thread). Starter
code that spawns one additional thread is provided in the function
`mandelbrotThread()` located in `mandelbrotThread.cpp`. In this function, the
main application thread creates another additional thread using the constructor
`std::thread(function, args...)` It waits for this thread to complete by calling
`join` on the thread object.
Currently the launched thread does not do any computation and returns immediately.
You should add code to `workerThreadStart` function to accomplish this task.
You will not need to make use of any other std::thread API calls in this assignment.
**What you need to do:**
1. Modify the starter code to parallelize the Mandelbrot generation using
two processors. Specifically, compute the top half of the image in
thread 0, and the bottom half of the image in thread 1. This type
of problem decomposition is referred to as _spatial decomposition_ since
different spatial regions of the image are computed by different processors.
2. Extend your code to use 2, 3, 4, 5, 6, 7, and 8 threads, partitioning the image
generation work accordingly (threads should get blocks of the image). Note that the processor only has four cores but each
core supports two hyper-threads, so it can execute a total of eight threads interleaved on its execution contents.
In your write-up, produce a graph of __speedup compared to the reference sequential implementation__ as a function of the number of threads used __FOR VIEW 1__. Is speedup linear in the number of threads used? In your writeup hypothesize why this is (or is not) the case? (you may also wish to produce a graph for VIEW 2 to help you come up with a good answer. Hint: take a careful look at the three-thread datapoint.)
3. To confirm (or disprove) your hypothesis, measure the amount of time
each thread requires to complete its work by inserting timing code at
the beginning and end of `workerThreadStart()`. How do your measurements
explain the speedup graph you previously created?
4. Modify the mapping of work to threads to achieve to improve speedup to
at __about 7-8x on both views__ of the Mandelbrot set (if you're above 7x that's fine, don't sweat it). You may not use any
synchronization between threads in your solution. We are expecting you to come up with a single work decomposition policy that will work well for all thread counts---hard coding a solution specific to each configuration is not allowed! (Hint: There is a very simple static
assignment that will achieve this goal, and no communication/synchronization
among threads is necessary.). In your writeup, describe your approach to parallelization
and report the final 8-thread speedup obtained.
5. Now run your improved code with 16 threads. Is performance noticably greater than when running with eight threads? Why or why not?