$29
I. Below I have appended the three loop versions of matrix multiply. Please do the following experiments (preferably C or C++) using matrix sizes of 200 x 200, 400 x 400 and 800 x 800:
1. Implement Loop I and run for various values (sizes) of loop variable n. Make sure that the compiler optimization is turned off.
2. Compare the performance time of Loop I to Loop II. Again make sure that the optimization is turned off.
3. Recompile and rerun Loop I with the optimization turned on. Compare the execution time Loop I with the compiler optimization turned on and turned off (you will need to look at different values of the problem size n) for Loop I.
4. Compare the Loop I time with the optimization turned on with the Loop II time with the compiler optimization turned off.
5. With the compiler optimization option turned off and then on, compile and execute Loop III and compare the times.
a. Find the optimum size for the parameter s - i.e., the block size for your cache.
Loop I
for i = 1 to n do
for j = 1 to n do
for k = 1 to n do
C[i,j] = C[i,j] + A[i,k] x B[k,j]
endfor
endfor
endfor
Loop II
for i = 1 to n do
for k = 1 to n do
for j = 1 to n do
C[i,j] = C[i,j] + A[i,k] x B[k,j]
endfor
endfor
endfor
Loop III
for it = 1 to n by s do
for kt = 1 to n by s do
for jt = 1 to n by s do
for i = it to min(it+s-1,n) do
for k = kt to min(kt+s-1,n) do
for j = jt to min(jt+s-1,n) do
C[i,j] = C[i,j] + A[i,k] x B[k,j]
endfor
endfor
endfor
endfor
endfor
endfor
Note: For your convenience, I have attached a matrix times vector program that should compile and run. Use this working program as the basis for you matrix times matrix exercise.