$24
Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/ store instruc-tions per processor is divided by 0:7 p (where p is the number of processors) but the number of branch instructions per processor remains the same.
(1) Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processor result relative to the single processor result.
(2) If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1, 2, 4, or 8 processors?
(3) To what should the CPI of load/store instructions be reduced in order for a single processor to match the performance of four processors using the original CPI values?
Solution: (1) Consider the program on one processor, the total cycle count is 1 2:56 109 + 12 1:28 109 + 5
2:56 108 = 1:92 1010. Then consider the total excution time, that is, the cycle count=cycle frequency:
Excution time for one processor is 1:92 1010=(2 109) = 9:6 109
By the same way, the total execution time for this program on 2, 4, and 8 processors are shown in the following table.
processor count
arithmetic inst.
L/S inst.
branch inst.
cycles
excution time
speed up rate
1
2.56E9
1.28E9
2.56E8
1.92E10
9.60E9
1.00
2
1.83E9
9.14E8
2.56E8
1.41E10
7.05E9
1.36
4
9.14E8
4.57E8
2.56E8
7.68E9
3.84E9
2.50
8
4.57E8
2.29E8
2.56E8
3.46E9
1.73E9
5.55
Table 1: Excution time and speed-up rate
(2) The answer is shown in following table.
processor count excution time
• 9.84E9
• 7.95E9
• 4.30E9
• 2.47E9
Table 2: Excution after double the CPI of arithmetic instructions
(3) Assume the CP IL=S inst: is ruduce to x. Then the excution time (Hint: the performance is measured by the excution time) of one processor is
2:56 109 + 1:28 109x + 2:56 108 5 = 3:84 109 + 1:28 108x With original CPI values, the excution time of four processors is
2:56 109=(0:7 4) + 1:28 109 12=(0:7 4) + 2:56 108 5 = 7:68 109
To minimize the di erence of performance, that is to ruduce the abs di erence of the two result above:
Course Name:计算机组成原理
Assignment 1
易翔
Problem 2 (1.11) Score:
.
The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s.
(1) Find the CPI if the clock cycle time is 0.333 ns.
(2) Find the SPECratio.
(3) Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% without a ecting the CPI.
(4) Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%.
(5) Find the change in the SPECratio for this change.
(6) Suppose that we are developing a new version of the AMD Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the number of instructions has been reduced by 15%. The execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI.
(7) This CPI value is larger than obtained in 1.11.1 as the clock rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why?
(8) By how much has the CPU time been reduced?
(9) For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10% without af ecting to the CPI and with a clock rate of 4 GHz, determine the number of instructions.
(10) Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the number of instructions and with the CPI unchanged.
(11) Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged.
Solution: (1) CPU time = Instruction count CPI clock cycle time, then CP I =
750s
0:943.
2:389 1012 0:333 10 9s
(2) The SPECratio is
Tref
=
9650s
12:87.
Tactual
750s
(3) From CPU time = Instruction count CPI clock cycle time, hence if number of instructions of the benchmark is increased by 10% then the CPU time is insreased by 10%.
(4) From (3) we can obtain that = 1:1 1:05 = 1:155(Told represents the CPU time before change, and Tnew
represents the CPU time after change). That is, the CPU time is increased by 15.5%.
(5) The change of SPECratio can extract from the change of excution time because:
Tref
Tactualold
SP ECrationew
=
Tactualnew
=
=
1
0:866
SP ECratioold
Tref
Tactualnew
1:155
T
actualold
That is, the SPECratio is decreased by 13.4%.
(6) CPU time = Instruction count
CPI
clock cycle time, then CPI =
700 4 109
12
1:38
0:85 2:389 10
(7) Clock rate ratio is 43GHzGhz 1:33. CPI ratio is 10::3894 1:47. So they are dissimilar. The reason is when the number of instructions has been reduced by 15%, the CPU time is decreased from 750s to 700s at the same time, then there is a di erence between clock rate ratio and CPI ratio.
(8) 700750ss 0:933, so the CPU time has been reduced by 6.7%.
(9) Number of instructions = CPU time
CP I
= 960 0:9 4 10
9
=1:61
2:147 10
12
CP Utime
(10) Only change clock rate to rudece the CPU time. Clock rate = Number of instructions CPI / CPU time. The new clock is 3GHz 01:9 3:33GHz.
(11) Change clock rate and CPI to rudece the CPU time. Clock rate = Number of instructions CPI / CPU time. The new clock is 3GHz 0:85 01:8 3:19GHz.
2 / 2