Homework #1 Solution

Starting from:

~~$35~~

$29

Home

What are the two main styles of parallelism? Explain.

What are the three parallel programming paradigms? Explain.

Discuss the difference between shared address space machines and distributed address space machines. And discuss the advantages and disadvantages of both architectures.

Give an example of anti dependence and give a corresponding solution to remove the dependence.

Consider the search tree shown in the following figure, in which the dark node represents the solution.

If a sequential search of the tree is performed using the standard depth-first search (DFS) algorithm, how much time does it take to find the solution if traversing each arc of the tree takes one unit of time? Note: DFS begins by expanding the initial node and generating its successors. In each subsequent step, DFS expands one of the most recently generated nodes. If this node has no successors (or cannot lead to any solutions), then DFS backtracks and expands a different node.

Assume that the tree is partitioned between two processing elements that are assigned to do the search job, as shown in figure b. If both processing elements perform a DFS on their respective halves of the tree, how much time does it take for the solution to be found? What is the speedup? Is there a speedup anomaly? If so can you explain the anomaly?

6. Derive the formula for calculating the average access time for a word in a system with three levels of cache. Assume the following values for a theoretical system containing an L1, L2, and L3 cache.

Location

---------

L1

L2

L3

Main Memory

Latency

-------------

2 ns

8 ns

25 ns

100 ns

If an application has following hit rates, what is the average memory access time for a memory word?

Location
Hit rate
------------
----------
L1
50%
L2
70%
L3
90%
Main Memory
100%

7. Consider a parallel matrix vector multiplication algorithm:

C=A*b

14
1
2
3
1
14
1

2

3

32
= 45
6
× 2
32
= 4
× 1
+ 5
× 2
+ 6
× 3

8

50
7
9
3
50
7

8

9

Sequential code:

matvec() {

int i,j;

for (j=0;j<n;j++){

for (i=0;i<m;i++) {

c[i]=c[i]+a[i][j]*b[j];

}

}

}

Parallel code:

void matvec() {

int i,j,nprocs,myid;

float tmp[max];

for (i=0;i<m;i++){

tmp[i]=0;

}

nprocs= no. of processors used;

myid=process id;

for (j=myid;j<n;j+=nprocs){

for (i=0;i<m;i++)

tmp[i]=tmp[i]+a[i][j]*b[j];

}

lock();

for (i=0;i<m;i++)

c[i]=c[i]+tmp[i];

unlock();

}

Using fixed-size speedup and scalability analysis techniques similar to those employed in the class notes, answer the following.

Analyze the running time of the serial algorithm as a function of the matrix dimension n and m, You may assume all operations take unit time. (More appropriate runtime order techniques will be presented later in the course – “big O” notation.)

Analyze the running time of the parallel algorithm as a function of n, m and p. You may also assume n and m are evenly divisible by p.

Obtain expression for the speedup, Sp, and the Amdahl’s fraction α.

Determine if the algorithm is effective. Briefly explain.

Note on cheating: There are penalties for cheating. Don’t find out the hard way. Working in groups is fine for discussing approaches and techniques. Copying problem solutions or code is cheating. Both the person copying and the person giving the answers will be equally penalized. Make sure you do your own work.

3