$35
Problem 1 - Dependencies
Suppose we have the following program (a sequence of instructions), with each instruction labeled as
S<num>:
    • 1 :   a  :=  4;
    • 2 :   b  :=  2;
    • 3 :   c  :=  5;
    • 4 :   read ( d );
    • 5 :   a  :=  b  +  3;
    • 6 :   b  :=  a  -  3;
    • 7 :   c  :=  d  *  b ;
    • 8 :   e  :=  a  +  6;
    • 9 :   print ( c );
    • 10 :   print ( e );
        (a) Give the statement-level dependence graph for the above program. A node in the statement-level dependence graph represents a statement, an edge represents dependence between the statements (i.e. the nodes). Label each edge as a true data dependence, an anti data dependence, or an output data dependence.
        (b) Assume that each statement takes 1 cycle to execute. What is the execution time of the sequen-tial code? What is the fastest parallel execution time of the program (i.e. the critical path)? You may assume that I/O operations (read and print) can be done in parallel.
Problem 2 - Dependence Analysis
Give the source and sink references, the type (whether a dependence is true, anti, or output), and the distance vectors for all dependences in the following loops.
    (a) do i = 3, 100
a(i) = a(i - 1) + a(i + 1) - a(i - 2)
enddo
    (b) do i = 2, 100
a(3 * i) = a(3 * i - 3) + a(3 * i + 3)
enddo
    (c) do i = 1, 10
a(i) = a(5) + a(i)
enddo
Use aW (i) and aR(i) to annotate the write access to a(i) and the read access to a(i) respectively.
Problem 3 - Loop Parallelization
Given the following nested loop:
1
do  i  =
2
,
100
do
j
=
2 ,
100
S1 :
a (i ,  j )  =  b ( i  -  1 ,  j  -  1)  +  2
    • 2 :b (i , j ) = i + j - 1 enddo
enddo
    (a) Give the statement-level dependence graph. Show the dependence graph for statement instances in a part of the iteration space: i = 2 ... 5, j = 2 ... 5.
    (b) In its current form, can any loop be parallelized? If so, which loop(s)? If not, justify your answer.
    (c) Provide one valid a ne schedule for statements S1 and S2 such that p(S1)= C11*i + C12*j + d1 and p(S2)= C21*i + C22*j + d2 in order to achieve synchronization-free parallelism. There could be many possible solutions for fC11; C12; C21; C22; d1; d2g. You only need to provide one feasible solution. (Hint: You can let d1 = d2 = 0.)
    (d) Generate two-level loop code for the a ne schedule you provided. Please use p as outermost loop and i as innermost loop in the transformed loop. Calculate the loop bounds for p and i using Fourier-Motzkin elimination. You might need to calculate the overlapping polyhedron for S1 and S2 in order to eliminate the j loop. Please refer to the techniques for code generation in lecture 20 and lecture 21.
2