Starting from:
$35

$29

Parallel Matrix times Vector Solution


Consider this as an expanded assignment of assignment 2.
A Row-wise Block Parallel Mapping I


This implementation requires no synchronization

K    I    K

=
T0


Thread T0

T1


Thread T1










K

Y    A    X
A Row-wise Block Parallel Mapping II


This implementation requires lots of synchronization






A[n X m]



s



















X[0, … , n-1]









T00
T01

T0n


T10
T11

T1n
X[0, … , n-1]









Thread Tij



where

Tm0    Tm1     Tmn     i = 1 . . . m J = 1 . . .n
A Row-wise Block Parallel Mapping III


This implementation requires lots of synchronization





A[n X m]


s

















X[0, … , s-1]







T00
T10

Tm0

T01
T11

Tm1
X[s, … , 2s-1]

















T0n
T1n

Tmn
X[n-s-1, … , n-1]










Parallel Matrix times Vector












Give reasons for the differences between mapping I, II and III in terms of spatial- and temporal-locality.



Question: if these mappings are run sequentially, would they have the same performance?

More products