$24
In Lab 3, you will implement a pipelined version of the LC-3b. The pipeline for the LC-3b has five stages: Fetch (F), Decode/Register Read (DE), Address Generation/Execute (AGEX), Data Memory Access (MEM), and Store Result (SR). Between each stage, there are pipeline latches that are used to propagate data through the pipeline every cycle (see Figure 1). The main logic components of each stage are shown in Figures 2-6. You will need to implement each stage as shown in these figures. Some logic blocks in the figures are labeled “LOGIC.” The outputs of these blocks are shown in the figure, but the inputs to some of these blocks are omitted. It is your job to figure out the inputs to these blocks and the implementation of the logic inside the blocks. All of these blocks, including the “Dependency Check Logic,” generate outputs that control the stalling of the pipeline and the insertion of pipeline “bubbles”.
The remainder of this section describes the pipelined LC-3b microarchitecture.
1.1 An Overview of the Pipeline
The LC-3b pipeline has five stages. The F stage is used for fetching an instruction into the DE latches. In the DE stage, the control store is accessed to generate some of the control signals required for processing the instruction. In parallel with the control store access, the register file is also accessed to retrieve the register operands of the instruction. The Dependency Check Logic determines if the instruction in the DE stage is reading a value produced by an older instruction that is in the AGEX, MEM, or SR stage. If so, the instruction in the DE stage should not be propagated to the next stage (this will be described in more detail later). The AGEX stage performs address computation for instructions that need to generate an address. Operate instructions also produce their result in this stage using the ALU or the shifter. In the MEM stage, load instructions read data from memory, and store instructions write data to memory. Control instructions have to wait until the MEM stage to update the PC since TRAP instructions need to obtain the starting address of the TRAP sevice routine from memory. The direction of a conditional branch instruction is also determined in the MEM stage. Instructions that write into a destination register and set the condition codes perform these updates in the SR stage. All other instructions do nothing in the SR stage.
1.1.1 Handling of Dependencies
An instruction in the DE stage may require a value produced by an older instruction that is in the AGEX, MEM, or SR stage. If so, the instruction in the DE stage should be stalled, and a bubble should be inserted into the pipeline. Note that this implementation implies that we are NOT using data forwarding. The dependency check is performed by the Dependency Check Logic in the DE stage. The inputs to this logic are shown in Figure 3. The logic has one output signal, DEP.STALL, which is asserted if a dependency exists.
To determine if a dependency exists, the Dependency Check Logic compares the destination register number of the instructions in the AGEX, MEM, and SR stages to the source register numbers of the instruction in the DE stage. If a match is found and the instruction in the DE stage actually needs the source register (as indicated by the SR1.NEEDED or SR2.NEEDED control signal) and an instruction in a later stage actually writes to the same register (as indicated by the V.AGEX.LD.REG, V.MEM.LD.REG, or V.SR.LD.REG signals), DEP.STALL should be set to 1. The Dependency Check Logic also checks for dependencies on the condition codes. If the instruction in the DE stage is a conditional branch instruction (as indicated by the BR.OP control signal), and if any of the instructions in the AGEX, MEM, or SR stages is writing to the condition codes, DEP.STALL should be set to 1.
1.1.2 Handling of Control Instructions
The basic pipelined microarchitecture of the LC-3b stalls the pipeline after a control instruction is fetched. These in-structions are resolved in the MEM stage. Hence, a three-cycle bubble is inserted into the pipeline after each control instruction. A control instruction is identified using the BR.STALL signal from the microcontrol store. This signal is 1 for all control instructions and 0 for all other instructions. If any of the instructions in the DE, AGEX, or MEM stages is
1
a valid control instruction, then the F stage should insert bubbles into the pipeline. This is accomplished by setting the valid bit of the DE latches (DE.V) to 0.
1.1.3 Handling of Memory Operations
There are two caches in the pipelined LC-3b microarchitecture: the Instruction Cache (I-Cache) and the Data Cache (D-Cache). The I-Cache is accessed in the F stage. If the data is found in the I-Cache, the I-Cache asserts the ICACHE.R signal. The D-Cache is accessed in the MEM stage by those instructions that need to read data from or write data to memory. These instructions should have the DCACHE.EN control signal set in the control store. When the D-Cache access is complete, the D-Cache asserts the DCACHE.R signal.
1.2 Description of Pipeline Stages
1.2.1 Fetch Stage (F)
In this stage, the I-Cache is accessed using the address in the PC. The I-Cache asserts the ICACHE.R signal when the access is complete. This signal is asserted some time in the middle of the clock cycle. If the I-Cache is not ready, then logic in the F stage needs to stall the pipeline. Remember that the valid bits associated with the pipeline latches are used to accomplish this. If the data fetched from the I-Cache is garbage (i.e. ICACHE.R is not asserted) and the pipeline is not stalling for some other reason, then the valid bit for the DE stage latches (DE.V) should be set to 0. This valid bit indicates that the values in the DE latches are not meaningful and can be ignored.
In the simulator, you are given an icache access function. This function takes as input a 16-bit address, which should come from the PC. The outputs provided by this function are the 1-bit ICACHE.R signal and the 16-bit instruction. You must use this interface to perform accesses to the I-Cache.
The F stage also includes the logic used to update the PC. If there are no stalls or control instructions in the pipeline, the PC should be incremented by 2. If a control instruction other than TRAP is supposed to write into the PC, the TARGET.PC value coming from the MEM stage should be latched into the PC at the end of the current cycle. If a TRAP instruction is supposed to write into the PC, the TRAP.PC value coming from the MEM stage should be latched into the PC at the end of the current cycle. The next value to be latched into the PC is controlled by the MEM.PCMUX signal, which is generated in the MEM stage, and the LD.PC signal, which is generated by a logic block you should design.
At the end of the clock cycle, if the LD.DE signal is asserted, the DE latches (which contain DE.NPC, DE.IR, and DE.V) should latch their input values. DE.NPC contains the address of the next instruction, and DE.IR contains the current instruction fetched from the I-Cache. It is your job to figure out the logic that generates the LD.DE signal. Think about when you do not want to load enable the DE latches. Hint: Do you want to load enable the DE latches if DEP.STALL is asserted? How about MEM.STALL?
1.2.2 Decode Stage (DE)
The instruction in the DE stage accesses the control store using a 6-bit address that is obtained by concatenating IR[15:11] and IR[5]. Note that IR[11] and IR[5] are not actually part of the opcode, but they are used to access the control store. These bits are meaningful for some instructions (think about which instructions), and based on the value of these bits, different values are assigned to the control signals in different entries of the control store.
Each entry in the control store contains 23 bits that are used to control different structures in various stages of the pipeline. Table 2 shows the pipeline stages in which each bit in the control store is used. At the end of the clock cycle, 20 bits from the control store are latched into the AGEX.CS latch.
The instruction in the DE stage also reads the register file and the condition codes. The register file has two read ports: one for SR1 and one for SR2. DE.IR[8:6] are used to address the register file to read SR1. Either DE.IR[11:9] or DE.IR[2:0] are used to address the register file to read SR2. DE.IR[13] is used to select between DE.IR[11:9] or DE.IR[2:0].
2
At the end of the clock cycle, the SR1 value from the register file is latched into the AGEX.SR1 latch, and the SR2 value is latched into the AGEX.SR2 latch. The value of the condition codes is latched into the 3-bit AGEX.CC latch. Condition code N is stored in AGEX.CC[2], Z is stored in AGEX.CC[1], and P is stored in AGEX.CC[0]. Note that the condition codes and register file are read and the values obtained are latched regardless of whether an instruction needs these values.
The register number (ID) for the destination register is latched into the AGEX.DRID latch at the end of the clock cycle.
The DRMUX signal from the control store selects whether DE.IR[11:9] or the value “7” is latched into AGEX.DRID.
The DE stage also contains the Dependency Check Logic, whose operation was described earlier. The output of this logic (DEP.STALL) indicates whether or not the instruction in the DE stage should be propagated forward. If DEP.STALL is asserted, the state of the DE latches should not be changed, and a bubble needs to be inserted into the AGEX stage. This is accomplished by setting the valid bit for the AGEX stage (AGEX.V) to 0. Other actions need to be taken to preserve the correct value of the PC. Therefore, the DEP.STALL signal is also used by the structures physically located in the F stage. It is your job to figure out how the DEP.STALL signal affects the logic in the F stage.
The BR.STALL signal from the control store indicates that the instruction being processed is a control instruction, and hence the frontend of the pipeline should be stalled until this instruction updates the PC in the MEM stage. In the DE stage, if DE.V is 1 and BR.STALL is 1, then the DE.BR.STALL signal should be asserted. This indicates that the instruction in the DE stage is a valid control instruction. The DE.BR.STALL signal is used to insert bubbles into the pipeline in the F stage. It is again your job to figure out how to use the DE.BR.STALL signal. Think about all cases that might happen. For example, what happens if a branch instruction in the DE stage is stalled due to a data dependency on an older instruction that sets the condition codes? Should the F stage still insert a bubble into the pipeline?
Note that you also need to design the logic required to generate the LD.AGEX signal and the input signal to the AGEX.V latch. Think about when you need to load disable the AGEX latches and when you need to insert a bubble into the AGEX stage.
1.2.3 Address Generation/Execute Stage (AGEX)
During the AGEX stage, operate instructions compute their result. Instructions that need to generate an address (to update the PC or to access memory) also calculate their address in this stage. The control signals from the AGEX.CS latches control the muxes in this stage.
Note that there are two logic structures you need to design in this stage. One logic structure determines the LD.MEM signal and the input signal to the MEM.V latch. The other logic structure generates the signals to be sent to the previous stages of the pipeline. The outputs of this structure are V.AGEX.LD.CC, V.AGEX.LD.REG, and V.AGEX.BR.STALL. The first two signals are required by the Dependency Check Logic. V.AGEX.BR.STALL indicates that the instruction being processed in the AGEX stage is a valid control instruction, and therefore, the frontend of the pipeline needs to stall and insert bubbles. Note that this logic block simply gates the LD.CC, LD.REG, and BR.STALL control signals from the AGEX.CS latch with the AGEX.V bit.
1.2.4 Data Memory Access Stage (MEM)
In this stage, the D-Cache is accessed by those instructions that need to read from or write to memory (as indicated by the DCACHE.EN bit in the control store). The DCACHE.R/W bit in the control store indicates a read or write access, and the DATA.SIZE signal from the control store indicates whether this is a byte or word access.
The inputs to the D-Cache are:
1-bit enable signal (the DCACHE.EN signal from the MEM.CS latch gated with MEM.V)
2-bit WE signal. WE0 is the write-enable for the the low byte of a word. WE1 is the write-enable for the high byte of a word. You will need to generate these signals based on the values of DCACHE.R/W, DATA.SIZE, and MEM.ADDRESS[0].
3
16-bit input address, MEM.ADDRESS[15:0], indicating which word is to be accessed.
16-bit input data. This is the data that needs to be written into the D-Cache if DCACHE.R/W is 1.
The outputs of the D-Cache are:
1-bit DCACHE.R signal. If this signal is asserted it means that the access is complete.
16-bit output data. This is the data read from the D-Cache. Note that you need to design the logic to shift and sign extend the appropriate byte if the access is a byte access.
In the simulator, you are given a dcache access function, which you need to call if the 1-bit enable signal to the D-Cache is set. This function takes as input the two WE signals, the 16-bit address, and the 16-bit input data. This function outputs the DCACHE.R bit and the 16-bit output data. You must use this interface when accessing the D-Cache. For further explanation, see the shell code.
One signal you need to generate in this stage is the MEM.STALL signal. This signal is used to stall the pipeline and insert a bubble into the SR latches if a valid memory access is not complete (i.e. DCACHE.R is 0). This signal is also required by all previous stages to correctly stall the pipeline.
Control instructions update the PC when they reach the MEM stage. The BR LOGIC block shown in Figure 5 generates the 2-bit PCMUX signal required by the F stage of the pipeline. You must implement this logic. There are six inputs to this logic:
1-bit valid bit from the MEM.V latch. This bit is set if the instruction in the MEM stage is valid.
1-bit BR.OP signal from the MEM.CS latch. This bit is set if the instruction is a branch.
1-bit UNCON.OP signal from the MEM.CS latch. This bit is set if the instruction is a JSR/JSRR or a JMP.
1-bit TRAP.OP signal from the MEM.CS latch. This bit is set if the instruction is a TRAP.
Values of the condition codes from the MEM.CC latch.
MEM.IR[11:9]: bits [11:9] of the instruction in the MEM stage.
Another logic block that you need to implement is the block that generates the input to the SR.V latch. The last block you need to implement in this stage is the one whose outputs are V.MEM.LD.CC, V.MEM.LD.REG, and V.MEM.BR.STALL. The V.MEM.LD.CC and V.MEM.LD.REG signals are inputs to the Dependency Check Logic. The V.MEM.BR.STALL signal is needed in the F stage to insert a bubble into the pipeline. Note that although control instructions are resolved in the MEM stage, we still need to insert a bubble while a control instruction is being processed in the MEM stage. Think about why it should be this way. Also think about how the LD.PC signal should be generated if a control instruction is being processed in the MEM stage.
1.2.5 Store Result Stage (SR)
This stage is the stage where the instruction, if it is valid (as indicated by SR.V), writes into the register file and sets the condition codes. A 4-input mux, whose control signals come from the SR.CS latch, selects the 16-bit data to be written into the register file. This data can come from one of four places:
SR.ADDRESS latch, which contains the address that was generated in the AGEX stage and propagated through the pipeline.
SR.DATA latch, which contains the data read from memory (and shifted and sign-extended, if it was a byte access) in the MEM stage and stored in SR.DATA at the end of the previous clock cycle.
4
SR.NPC latch, which contains the address of the next instruction. Think about why this would possibly be written into the destination register.
SR.ALU.RESULT latch, which contains the result generated by the shifter or the ALU in the AGEX stage and propagated through the pipeline.
For each instruction that writes to a destination register, you must determine which of the four values above should be loaded into the destination register. You should set the value of the 2-bit DR.VALUEMUX signal in the control store accordingly.
The SR stage also contains the logic that determines the values of the condition codes. The last logic block contained in this stage outputs the V.SR.LD.REG and V.SR.LD.CC signals indicating that a valid instruction is writing into the register file and condition codes, respectively. These signals are used as write-enable signals in the register file and condition codes; they are also inputs to the Dependency Check Logic.
Note that the latches for the SR stage do not have a load-enable signal (LD.SR) associated with them. Why is a load-enable signal not required for the SR latches?
The structures in this stage are implemented in the simulator for you to help get you started.
1.2.6 Control Signals and Their Propagation in the Pipeline
The control signals used in each stage are listed in Table 1. Note that some of these control signals come from the control store (see Table 2). There are 23 signals stored in each entry of the control store. Three of these signals are only needed in the DE stage. Therefore, only 20 of the control store signals need to be propagated to the next stage (AGEX). These signals are latched into the AGEX.CS latch shown on the datapath. Nine of the 20 signals are only needed in the AGEX stage, so only 11 signals are propagated into the next stage (MEM). As shown in Table 2, 7 of the control signals from the MEM.CS latch are not needed beyond the MEM stage, so only 4 signals are latched into the SR.CS latch. In the simulator, you are given the code that propagates the necessary control signals from one stage to the next.
1.2.7 Stall Signals
As mentioned in the previous sections, you will need to implement the logic to generate the following stall signals. These signals are also shown in Table 3.
ICACHE.R: Asserted if the I-Cache provides a useful instruction in this cycle. If 0, the F stage should insert bubbles into the pipeline.
DEP.STALL: Asserted if the instruction in the DE stage is valid and at least one of its input values has not been written into the register file yet.
MEM.STALL: Asserted if the instruction in the MEM stage is valid and needs to access memory, and the DCACHE.R signal is 0.
V.DE.BR.STALL: Asserted if the instruction in the DE stage is a valid control instruction.
V.AGEX.BR.STALL: Asserted if the instruction in the AGEX stage is a valid control instruction.
V.MEM.BR.STALL: Asserted if the instruction in the MEM stage is a valid control instruction.
These signals are already declared and initialized in the simulator shell code for you. You will need to generate their values and use them to implement some of the logic blocks in the pipeline.
5
1.3 What You Need to Do
In Lab 3, you need to complete the simulator to implement the pipeline described in this document. To do this, you need to figure out what the unimplemented logic blocks are supposed to do. You also need to fill out the new microcontrol store at the end of this document.
You are NOT required to implement the RTI instruction or interrupt/exception handling.
6