$24
[30%] Similar to what we did in the lab and in class on Friday, demonstrate the operation of a carry-lookahead adder on the following inputs:
a: 0101 1100 0010 1110
b: 0110 1111 0101 0101
As we did in class, demonstrate the complete sequence of steps (small p, small g, Super P, Super G, Super C, small c, and then the sum). Demonstrate that the result is the same as with ripple-carry adder.
Assume that the new instruction called “store sum” (SS) needs to be added to LEGv8 instruction set. In this question, you have to describe changes needed for single-cycle datapath and control logic to implement this new instruction. The instruction description is as follows:
SS Rd, Rm, Rn: Mem[Reg[Rd]]=Reg[Rn]+Reg[Rm];
Assume that the positions of the register fields (Rd, Rn, Rm) within the instruction are the same as in the R-format instructions shown on slside 23. Answer the following questions:
[10%] Describe the operation of this new instruction step-by-step showing what data transfers occur during each cycle and what operations are being performed.
[10%] Describe the changes needed for the processor datapath (shown on Figure 4.23 and slide 29) to implement this new instruction. Depict these changes on a datapath figure. You can either print out a copy of slide 29 and make changes there, or draw relevant parts from scratch.
[10%] Describe changes needed for the control logic, including any new control signals. Use the format similar to Figure 4.18 to present your changes.
Consider the CPU shown in Figure 4.23 in the book and on slide 29. Assume that the logic blocks used to implement the datapath have the following latencies: Instruction memory – 250 ps, data memory – 250 ps, register file – 150 ps, ALU – 200ps, Adder – 150 ps, single gate – 5ps, sign-extension logic – 50ps, control logic – 50 ps, register read – 30ps, register setup – 20ps. “Register read” is the time needed after the rising edge of the clock for the new register value to appear at the output. This value applies to the PC only. “Register setup” is the amount of time a register’s data input must be stable before the rising edge of the clock. This value applies to PC, register file and memory. You do not have to account for the register file and memory write delays separately. Assume that we only execute five instructions – LDUR, STUR, CBZ, B and ADD.
[3%] What is the latency of the ADD instruction (i.e., how long must the clock period be to ensure that this instruction works correctly)?
[3%] What is the latency of LDUR?
[3%] What is the latency of STUR?
[3%] What is the latency CBZ?
[3%] What is the latency of B?
[3%] What is the minimal clock period for this CPU?
Consider now the addition of a multiplier to this CPU. Assume that this addition will add 300ps to the latency of the ALU, but will reduce the number of instructions by 5%, because there will no longer be a need to emulate the multiply instruction in software.
[3%] What is the processor cycle time with the addition of the multiplier?
[3%] What is the speedup achieved by this addition?
[6%] What is the slowest that the new ALU can be and still result in improved performance?
[10%] Consider a 5-stage pipelined datapath with data forwarding logic. Assume that 50% of instructions executed are the load instructions (LDUR), and that half of these load instructions have a dependent instruction that immediately follows the load. There are no other dependencies in the program. Estimate the speedup of this processor compared to single-cycle non-pipelined implementation.