$29
Consider this as an expanded assignment of assignment 2.
A Row-wise Block Parallel Mapping I
This implementation requires no synchronization
K I K
=
T0
Thread T0
T1
Thread T1
K
Y A X
A Row-wise Block Parallel Mapping II
This implementation requires lots of synchronization
A[n X m]
s
X[0, … , n-1]
T00
T01
T0n
T10
T11
T1n
X[0, … , n-1]
Thread Tij
where
Tm0 Tm1 Tmn i = 1 . . . m J = 1 . . .n
A Row-wise Block Parallel Mapping III
This implementation requires lots of synchronization
A[n X m]
s
X[0, … , s-1]
T00
T10
Tm0
T01
T11
Tm1
X[s, … , 2s-1]
T0n
T1n
Tmn
X[n-s-1, … , n-1]
Parallel Matrix times Vector
Give reasons for the differences between mapping I, II and III in terms of spatial- and temporal-locality.
Question: if these mappings are run sequentially, would they have the same performance?