$29
Show pictorially what operation is being performed by the following data parallel HPF code segments. Assume any random data input to these arrays.
real a(5,5),b(5,5),c(5,5),d(5)
a(2,: )=d
a(1:3,: )=b(2:4,: )
where (b.eq.c) a=c
forall (i=2:4,j=2:5) a(i,j)=b(i-1,j-1)+c(i+1,j)
forall (i=1:5,j=1:5) b(i,j)=(i+j-1)
forall (j=1:5) d(j)=sum(c(1:4,j),dim=1)
a=spread(d,dim=2,ncopies=5)
b=spread(d,dim=1,ncopies=5)
a=cshift(b,dim=1,shift=3)
d=sum(spread(d,dim=1,ncopies=5),dim=2)
Problem 2
Write down HPF codes to perform the following operations.
given a 2-dimensional matrix a(100,100), zero out the upper diagonal elements
given a 2-dimensional matrix a(100,100), transpose it and assign to matrix b(100,100)
given a 1-dimensional array a(100), assign it to a two dimensional array b(100,5) by replicating a() column-wise 5 times
given a 2-dimensional array a(100,100), assign it to an array b(100,100) such that each element is circularly shifted left by two columns and down by 1 row in the result matrix
perform the following data transfer between arrays a(8) and b(4)
1
2
3
4
5
6
7
a()
1
2
3
b()
1
Problem 3
For each of the following data distributions write down the HPF data distribution and alignments directives.
(a) array a(18) distributed across four processors P1,…,P4 as shown below
1
6
11
16
2
7
12
17
3
8
13
18
4
9
14
5
10
15
P1
P2
P3
P4
partition array a(12) across four processors in a blocked distribution, and partition array b(12,12) across four processors such that all the columns of b() are on the same processor as the corresponding elements of a()
P1
a(12)
b(12)
2
Problem 4 [Note: you are required to write the code, no need to compile and execute the code]
In this problem you are asked to write a parallel algorithm using HPF to solve equations of the form A*x=b, where A is an n*n matrix and b is a vector. You will use Gaussian elimination without pivoting. You had written a shared memory parallel algorithm in HW2 and a message passing version in HW4. Now you need to convert the algorithm to a data parallel version using HPF. You should use a cyclic distribution of the array for better load distribution similar to what you did for HW2.
Please download the serial codes for the algorithm (gauss.f) from the blackboard. You can generate the data for the matrix A and vector b randomly. There should be clear comments at the beginning of the code explaining your algorithm.
3