Laboratory Exercise 2 An Enhanced Processor Solution

Starting from:

~~$35~~

$29

Home

In Laboratory Exercise 1 we described a simple processor. In Part I of that exercise the processor itself was designed, and in Part II the processor was connected to an external counter and a memory unit. This exercise de-scribes subsequent parts of the processor design. The numbering of figures and tables in this exercise are continued from those in Parts I and II of the preceding lab exercise.

In this exercise we will extend the capability of the processor so that the external counter is no longer needed, and so that the processor can perform read and write operations using memory or other devices. A schematic of the enhanced processor is given in Figure 12. In the figure registers r0 to r6 are the same as in Figure 1 of Lab 1, but register r7 has been changed to a counter. This counter is used to provide the addresses in the memory from which the processor’s instructions are read; in the preceding lab exercise, a counter external to the processor was used for this purpose. We will refer to r7 as the processor’s program counter (pc), because this terminology is common for real processors available in the industry. When the processor is reset, pc is set to address 0. At the start of each instruction (in time step T0) the value of pc is used as an address to read an instruction from the memory. The instruction returned from the memory is stored into the IR register and the pc is automatically incremented to point to the next instruction.

The processor’s control unit increments pc by using the incr_pc signal, which is just an enable on this counter. It is also possible to load an arbitrary address into pc by having the processor execute an instruction in which the destination register is specified as pc. In this case the control unit uses r7in to perform a load of the counter. Thus, the processor can execute instructions at any address in the memory, as opposed to only being able to execute instructions that are stored at successive addresses. The current content of pc, which always has the address of the next instruction to be executed, can be copied into another register if needed by using a mv instruction.

The enhanced processor will have four new instructions, which are listed in Table 3. The ld (load) instruction reads data into register rX from the external memory address specified in register rY. Thus, the syntax [rY] means that the content of register rY is used as an external address. The st (store) instruction writes the data contained in register rX into the memory address found in rY. The and instruction is similar to the add and sub instructions that were introduced in Lab 1. This instruction extends the adder/subtracter unit in the processor into an arithmetic logic unit. Besides performing addition and subtraction, it has the ability to generate a bit-wise logical AND (&) of the destination register rX with the second operand Op2. As discussed in Lab 1, the operand Op2 can be either another register rY, or immediate data #D.

The b{cond} instruction is used to cause a processor branch, which means to change the program counter (pc) to the address of a specific instruction. The cond part of the branch instruction is optional and represents a condition. The instruction loads the constant #Label into pc only if the specified condition evaluates to true. An example of a condition is eq, which stands for equal (to zero). The instruction beq #Label will load the constant Label into pc if the last result produced by the arithmetic logic unit, which is stored in register G, was 0. The b{cond} instruction is discussed in more detail in Part V of this exercise.

Operation
Function performed

ld rX, [rY]
rX
[rY]
st rX, [rY]
[rY]
rX
and rX, Op2
rX rX & Op2
b{cond} #Label
if (cond), pc
#Label

Table 3: New instructions in the enhanced processor.

1

16

16

16
incr_pc
E

16
16

Counter
Ain

r 0in

r 7in

L

r0
(r7)

A

Clock

ADDR

AddSub
Adder/Subracter
A DDR

16
Gin

16

Multiplexers

G

4

16

Buswires

DOUT
DOUT

Select

IR in

ADDRin

DIN
IR

DOUTin

Control FSM

W_D

W
Run

Resetn

Done

Figure 12: An enhanced version of the processor.

Recall from Lab 1 that instructions are encoded using a 16-bit format. For instructions that specify Op2 as a register the encoding is III0XXX000000YYY, and if Op2 is an immediate constant the format is III1XXXDDDDDDDDD. You should use these same encodings for this exercise. Assume that III = 100 for the ld instruction, 101 for st, 110 for and, and 111 for b{cond}.

Figure 12 shows two registers in the processor that are used for data transfers. The ADDR register is used to send addresses to an external device, such as a memory module, and the DOUT register is used by the processor to provide data that is to be stored outside of the processor. One use of the ADDR register is for reading, or fetching, instructions from memory; when the processor wants to fetch an instruction, the content of pc is transferred across the bus and loaded into ADDR. This address is provided to the memory.

In addition to fetching instructions, the processor can read data at any address by using the ADDR register. Both data and instructions are read into the processor on the DIN input port. The processor can write data for storage at an external address by placing this address into the ADDR register, placing the data to be stored into its DOUT register, and asserting the output of the W (Write) flip-flop to 1.

Connecting the Processor to External Devices

Figure 13 illustrates how the enhanced processor can be connected to memory and other devices. The memory unit in the figure is 16-bits wide and 256-words deep. A diagram of this memory is given in Figure 14. It supports both read and write operations and therefore has both address and data inputs, as well as a write-enable input. As

2
depicted in Figure 14, the memory has a clock input that is used to store the address, data, and write enable inputs into registers. This type of memory unit is called a synchronous static random access memory (SSRAM).

Figure 13 also includes a 9-bit output port (register) that can be used to store data from the processor. In the figure this output port is connected to a set of LEDs, like the ones available on the DE1-SoC board. To allow the processor to select either the memory unit or output port when performing a write operation, the circuit in-cludes address decoding, which is done using NOR gates and AND gates. If the processor’s upper address lines A15A14A13A12 = 0000, then the memory unit can be written. Figure 13 shows n lower address lines connected from the processor to the memory; since the memory has 256 words, then n = 8 and the memory’s address port is driven by the processor address lines A7 : : : A0. For addresses in which A15A14A13A12 = 0001, the data written by the processor is loaded into the output port connected to LEDs in Figure 13.

A15

A14

Processor A13 Memory

A12

ADDR
16
4

n
address

16

16

DOUT

data
q

DIN

W

wren

4

ResetnRun

A

A13

Done

A1415

A12

Clock

E

Resetn
9

9

Run
D
Q
LEDs

Figure 13: Connecting the enhanced processor to a memory unit and output register.

address

data

wren

clock

8
8

16
16
256 x 16 RAM
16
q

Figure 14: The synchronous SRAM unit.

3
Part III

Figure 15 gives Verilog code for a top-level file that you can use for this part of the exercise. The input and output ports for this module are chosen so that it can be implemented on a DE1-SoC board. The Verilog code corresponds to the circuit in Figure 13, plus an additional input port that is connected to switches SW8 : : : SW0. This input port can be read by the processor at addresses in which A15 : : : A12 = 0011. (Switch SW9 is not a part of the input port, because it is dedicated for use as the processor’s Run input.) To support reading from both the SW input port and the memory unit, the top-level circuit includes a multiplexer that feeds the processor’s DIN input. This multiplexer is described by using an if-else statement inside the always block in Figure 15.

The code in Figure 15 is provided with this exercise, along with a few other source-code files: flipflop.v, inst_mem.v, inst_mem.mif, and (part of) proc.v. The inst_mem.v source-code file was created by using the Quartus IP Catalog to instantiate a RAM:1-PORT memory module. It has a 16-bit wide read/write data port and is 256-words deep, corresponding to Figure 14.

The Verilog code in the proc.v file implements register r7 as a program counter, as discussed above, and includes a number of changes that are needed to support the new ld, st, and, and b{cond} instructions. In this part you are to augment this Verilog code to complete the implementation of the ld and st instructions, as well as the and instruction. You do not need to work on the b{cond} instruction for this part.

module part3 (KEY, SW, CLOCK_50, LEDR);

input [0:0] KEY;

input [9:0] SW;

input CLOCK_50;

output [9:0] LEDR;

wire [15:0] DOUT, ADDR;

wire Done, W;

reg [15:0] DIN;

wire inst_mem_cs, SW_cs, LED_reg_cs;

wire [15:0] inst_mem_q;

wire [8:0] LED_reg, SW_reg; // LED[9] and SW[9] are used for Run

proc U3 (DIN, KEY[0], CLOCK_50, SW[9], DOUT, ADDR, W, Done);

assign inst_mem_cs = (ADDR[15:12] == 4’h0);

assign LED_reg_cs = (ADDR[15:12] == 4’h1);

assign SW_cs = (ADDR[15:12] == 4’h3);

inst_mem U4 (ADDR[7:0], CLOCK_50, DOUT, inst_mem_cs & W, inst_mem_q);

always @ (*) // input multiplexer

if (inst_mem_cs == 1’b1)

DIN = inst_mem_q;

else if (SW_cs == 1’b1)

DIN = {7’b0000000, SW_reg};

else

DIN = 16’bxxxxxxxxxxxxxxxx;

regn #(.n(9)) U5 (DOUT[8:0], LED_reg_cs & W, CLOCK_50, LED_reg);

assign LEDR[8:0] = LED_reg;

assign LEDR[9] = SW[9];

regn #(.n(9)) U7 (SW[8:0], 1’b1, CLOCK_50, SW_reg); // SW[9] is used for Run endmodule

Figure 15: Verilog code for the top-level file.

4
Perform the following:

Extend the code in proc.v so that the enhanced processor fully implements the ld, st, and and instructions. Test your Verilog code by using the ModelSim simulator. Sample setup files for ModelSim, including a testbench, are provided along with the other files for this exercise. The sample testbench first resets the processor system and then asserts the Run switch, SW9, to 1. A sample program to test your processor is also provided, in a file called inst_mem.mif. This file represents the assembly-language program shown in Figure 16, which tests the ld and st instructions by reading the values of the SW switches and writing these values to the LEDs, in an endless loop. At the beginning of a simulation, ModelSim loads the contents of the file inst_mem.mif into the inst_mem memory module, so that the program can be executed by the processor. Examine the signals inside your processor, as well as the external LEDR values, as the program executes within the ModelSim simulation.

An assembler software tool, called sbasm.py, is provided for use with your processor. The Assembler is written in Python and is available at https://github.com/profbrown/sbasm.git. To use this Assembler you have to first install Python (version 3) on your computer. The Assembler includes a README file that explains how to install and use it. The sbasm.py Assembler can generate machine code for all of the processor’s instructions. The provided file inst_mem.mif was created by using sbasm.py to assemble the program in Figure 16. As the figure indicates, you can define symbolic constants in your code by using the .define directive, and you can use labels to refers to lines of code, such as MAIN. Comments are specified in the code by using //. The assembler ignores anything on a line following //.

.define LED_ADDRESS 0x1000

.define SW_ADDRESS
0x3000

// Read SW switches and display on LEDs

mvt
r3,
#LED_ADDRESS
// point to LED port

mvt
r4,
#SW_ADDRESS
// point to SW port
MAIN:
ld
r0,
[r4]
// read SW values

st
r0,
[r3]
// light up LEDs

mv
pc,
#MAIN

Figure 16: Assembly-language program that uses ld and st instructions.

An example result produced by using ModelSim for a correctly-designed circuit is given in Figure 17. It shows the execution of the first four instructions in Figure 16.

Once your simulation results are correct, use the Quartus Prime software to implement your Verilog code on a DE1-SoC board. A sample Quartus project file, part3.qpf, and Quartus settings file, part3.qsf, are provided with the exercise. Compile your code using the Quartus software, and download the resulting circuit into the DE1-SoC board. Toggle the SW switches and observe the LEDs to test your circuit.

5

Figure 17: Simulation results for the processor.

Part IV

In this part you are to create a new Verilog module that represents an output port called seg7. It will allow your processor to write data to each of the six 7-segment displays on a DE1-SoC board. The seg7 module will include six write-only seven-bit registers, one for each display. Each register should directly drive the segment lights for one seven-segment display, so that the processor can write characters onto the displays.

Perform the following:

A top-level file is provided for this part called part4.v. The top-level module has output ports for connecting to each of the 7-segment displays. Pin assignments for these ports, which are called HEX0[6:0], HEX1[6:0],
: : :, HEX6[6:0], are included in the Quartus settings file part4.qsf. For each display, segment 0 is on the top of the display, and then segments 1 to 5 are assigned in a clockwise fashion, with segment 6 being in the middle of the display.

The part4.v Verilog code includes address decoding for the new seg7 module, so that processor addresses in which A15A14A13A12 = 0010 select this module. The intent is that address 0x2000 should write to the register that controls display HEX0, 0x2001 should select the register for HEX1, and so on. For example, if your processor writes 0 to address 0x2000, then the seg7 module should turn off all of the segment-lights in the HEX0 display; writing 0x7f should turn on all of the lights in this display.

You are to complete the partially-written Verilog code in the file seg7.v, so that it contains the required six registers—one for each 7-segment display.

You can compile and test your Verilog code by using the ModelSim setup files that are provided for this part of the exercise. An inst_mem.mif file is also provided that corresponds to the assembly-language program shown in Figure 18. This program works as follows: it reads the SW switch port and lights up a seven-segment display corresponding to the value read on SW2 0. For example, if SW2 0 = 000, then the digit 0 is shown on HEX0. If SW2 0 = 001, then the digit 1 is displayed on HEX1, and so on, up to the digit 5 which would be shown on HEX5 if SW2 0 = 101.

Once your simulation results look correct, you should compile the provided Quartus project, and then down-load and test the circuit on a DE1-SoC board.

6

.define HEX_ADDRESS 0x2000

.define SW_ADDRESS 0x3000

This program shows the digits 543210 on the HEX displays. Each digit has to

be selected by using the SW switches.

mv
r5, pc
// return address for subroutine

mv
pc, #BLANK
// call subroutine to blank the HEX displays
MAIN:
mvt
r2, #HEX_ADDRESS
// point to HEX port

mv
r3, #DATA
// used to get 7-segment display pattern

mvt
r4, #SW_ADDRESS
// point to SW port

ld
r0, [r4]
// read switches

and
r0, #0x7
// use only SW2-0

add
r2, r0
// point to correct HEX display

add
r3, r0
// point to correct 7-segment pattern

ld
r0, [r3]
// load the 7-segment pattern

st
r0, [r2]
// light up HEX display

mv
pc, #MAIN

subroutine BLANK

This subroutine clears all of the HEX displays

input: none

returns: nothing

changes: r0 and r1. Register r5 provides the return address

BLANK:
mv
r0, #0
// used for clearing

mvt
r1, #HEX_ADDRESS
// point to HEX displays

st
r0, [r1]
// clear HEX0

add
r1, #1

st
r0, [r1]
// clear HEX1

add
r1, #1

st
r0, [r1]
// clear HEX2

add
r1, #1

st
r0, [r1]
// clear HEX3

add
r1, #1

st
r0, [r1]
// clear HEX4

add
r1, #1

st
r0, [r1]
// clear HEX5

add
r5, #1

mv
pc, r5
// return from subroutine
DATA:
.word
0b00111111
// ’0’

.word
0b00000110
// ’1’

.word
0b01011011
// ’2’

.word
0b01001111
// ’3’

.word
0b01100110
// ’4’

.word
0b01101101
// ’5’

Figure 18: Assembly-language program that tests the seven-segment displays.

7
Part V

In this part you are to enhance your processor so that it implements the b{cond} instruction. The conditions supported by the processor are called eq, ne, cc, and cs, which means that the variations of the branch instruction are b, beq, bne, bcc, and bcs. The b instruction always branches. For example, b #MAIN loads the address MAIN into the program counter. The meanings of the conditional versions are explained below.

The instruction beq means branch if equal (to zero). This instruction performs a branch operation (i.e., loads the provided #LABEL into the program counter) if the most recent result of an instruction executed using the arithmetic logic unit (ALU), which is stored in register G, was 0. Similarly, bne means branch if not equal (to zero). It performs a branch only if the contents of G are not equal to 0. The instruction bcc stands for branch if carry clear. It branches if the last add/subtract operation did not produce a carry-out. The opposite branch condition, bcs, branch if carry set, performs a branch if the most recent add/sub generated a carry-out. To support the conditional branch instructions, you should create two condition-code flags in your processor. One flag, z, should have the value 1 when the ALU generates a result of zero; otherwise z should be 0. The other flag, c, should be 1 when the adder/subtracter in the ALU produces a carry-out; otherwise c should be 0. Thus, c should be 1 when an add instruction generates a carry-out, or when a sub operation requires a borrow for the most-significant bit. Your FSM controller should examine these flags in the appropriate clock cycles when executing the b{cond} instructions.

The branch instructions are encoded using the format III1XXXDDDDDDDDD, where DDDDDDDDD is the branch address and XXX is the branch condition. Assume that conditions are encoded as none (always branch) = 000, eq = 001, ne = 010, cc = 011, and cs = 100.

Perform the following:

Enhance your processor so that it implements the condition-code flags z and c, and supports the b{cond} instruction. To help with testing and debugging of your processor, setup files for ModelSim are provided, including a testbench. It simulates your processor instantiated in the top-level file part5.v, which is the same as the one from Part IV. An example inst_mem.mif file is also provided, which corresponds to the program in Figure 19. This program is quite short, which makes it suitable for visual inspection of the waveforms produced by a ModelSim simulation. The program uses a sequence of instructions that test the various conditional branches. If the program reaches the line of code labelled DEAD, then at least one instruction has not worked properly. An example of ModelSim output for a correctly-working processor is given in Figure 20. It shows the processor executing instructions near the end of the code in Figure 19. The instruction that is completed at simulation time 1030 ns is add r0, #1 (0x5001). As shown in the figure, this instruction causes the carry flag, c, to become 1. The next instruction loaded into IR, at time 1090 ns, is bcs #0xC (0xF80C). Finally, the instruction loaded at 1170 ns is b #0 (0xF000).

MAIN:
mv
r0, #2

LOOP:
sub
r0, #1
// subtract to test bne

bne
#LOOP

beq
#T1
// r0 == 0, test beq

mv
pc, #DEAD

T1:
mvt
r0, #0xFF00

add
r0, #0xFF
// r0 = 0xFFFF

bcc
#T2
// carry = 0, test bcc

mv
pc, #DEAD

T2:
add
r0, #1

bcs
#T3
// carry = 1, test bcs

mv
pc, #DEAD

T3:
b
#MAIN

DEAD:
mv
pc, #DEAD

Figure 19: Assembly-language program that uses various branches.

8

Figure 20: Simulation results for the processor.

Once your ModelSim simulation indicates a correctly-functioning processor you should implement it on a DE1-SoC board. A Quartus project file part5.qpf and settings file part5.qsf are provided for this purpose. To test your processor, you can use the assembly-language program displayed in Figure 21. It provides code that tests for the correct operation of instructions supported by the enhanced processor. If all of the tests pass, then the program shows the word PASSEd on the seven-segment displays. It also shows a binary value on the LEDs that represents the number of successful tests performed. If any test fails, then the program shows the word FAILEd on the seven-segment displays and places on the LEDs the address in the memory of the instruction that caused the failure. Assemble the program, which is provided in a file called sitbooboosit.s, by using the sbasm.py assembler.

Use the output produced by sbasm.py to overwrite the file inst_mem.mif that you used in the beginning of this part of the exercise to simulate your processor system with ModelSim. Open the Quartus software, compile the part5 project, and download it onto your DE1-SoC board. If the Run signal is asserted, then your processor should execute the sitbooboosit program. If a failure is encountered, then the offending instruction can be identified by cross-referencing the LED pattern with the addresses in the file inst_mem.mif.

.define LED_ADDRESS 0x1000

.define HEX_ADDRESS 0x2000

mv
r2, #0
// used to count number of successful tests

mv
r6, #T1
// save address of next test

sub
r0, r0
// set the z flag
T1:
bne
#FAIL
// test bne; should not take the branch!

mv
r6, #C1
// save address of next test
C1:
beq
#C2
// test beq; should take the branch

b
#FAIL
// Argh!
C2:
add
r2, #2
// count the last two successful tests

Figure 21: Assembly-language program that tests various instructions. (Part a)

9

mv
r6, #T2
// save address of next test
T2:
bne
#S1
// test bne; should take the branch!

mv
pc, #FAIL

S1:
mv
r6, #C3
// save address of next test
C3:
beq
#FAIL
// test beq; should not take the branch

add
r2, #2
// count the last two successful tests

mv
r6, #T3
// save address of next test

mv
r3, #ALLONES

ld
r3, [r3]

add
r3, #1
// set the c flag
T3:
bcc
#FAIL
// test bcc; should not take the branch!

mv
r6, #C4
// save address of next test
C4:
bcs
#C5
// test bcs; should take the branch

b
#FAIL
// Argh!
C5:
add
r2, #2
// count the last two successful tests

mv
r6, #T4

mv
r3, #0

add
r3, r3
// clear carry flag
T4:
bcc
#S2
// test bcc; should take the branch!

mv
pc, #FAIL

S2:
mv
r6, #C6
// save address of next test
C6:
bcs
#FAIL
// test bcs; should bot take the branch!

add
r2, #2
// count the last two successes
// finally, test ld and st from/to memory

mv
r6, #T5
// save address of next test

mv
r4, #_LDTEST

ld
r4, [r4]

mv
r3, #0x1A5

sub
r3, r4

T5:
bne
#FAIL
// should not take the branch!

add
r2, #1
// increment success count

mv
r6, #T6
// save address of next test

mv
r3, #0x1A5

mv
r4, #_STTEST

st
r3, [r4]

ld
r4, [r4]

sub
r3, r4

T6:
bne
#FAIL
// should not take the branch!

add
r2, #1
// increment success count

mv
pc, #PASS

Loop over the six HEX displays

FAIL:mvt r3, #LED_ADDRESS

st
r6, [r3]
// show address of failed test on LEDs

mv
r5, #_FAIL

mv
pc, #PRINT

PASS:
mvt
r3, #LED_ADDRESS

st
r2, [r3]
// show success count on LEDs

mv
r5, #_PASS

Figure 21: Assembly-language program that tests various instructions. (Part b)

10

PRINT: mvt r4, #HEX_ADDRESS // address of HEX0

We would normally use a loop counting down from 6

with bne to display the six letters. But in this

testing code we can’t assume that bne even works!

ld
r3, [r5]
// get letter

st
r3, [r4]
// send to HEX display

add
r5, #1
// ++increment character pointer

add
r4, #1
// point
to next HEX display

ld
r3, [r5]
// get letter

st
r3, [r4]
// send to HEX display

add
r5, #1
// ++increment character pointer

add
r4, #1
// point
to next HEX display

ld
r3, [r5]
// get letter

st
r3, [r4]
// send to HEX display

add
r5, #1
// ++increment character pointer

add
r4, #1
// point
to next HEX display

ld
r3, [r5]
// get letter

st
r3, [r4]
// send to HEX display

add
r5, #1
// ++increment character pointer

add
r4, #1
// point
to next HEX display

ld
r3, [r5]
// get letter

st
r3, [r4]
// send to HEX display

add
r5, #1
// ++increment character pointer

add
r4, #1
// point
to next HEX display

ld
r3, [r5]
// get letter

st
r3, [r4]
// send to HEX display

add
r5, #1
// ++increment character pointer

add
r4, #1
// point
to next HEX display
HERE:
mv
pc, #HERE

_PASS:
.word
0b0000000001011110
// d

.word
0b0000000001111001
// E

.word
0b0000000001101101
// S

.word
0b0000000001101101
// S

.word
0b0000000001110111
// A

.word
0b0000000001110011
// P
_FAIL:
.word
0b0000000001011110
// d

.word
0b0000000001111001
// E

.word
0b0000000000111000
// L

.word
0b0000000000110000
// I

.word
0b0000000001110111
// A

.word
0b0000000001110001
// F
ALLONES: .word
0xFFFF

_LDTEST: .word
0x1A5

_STTEST: .word
0x15A

Figure 21: Assembly-language program that tests various instructions. (Part c)

Part VI

Write an assembly-language program that displays a binary counter on the LED port. Initialize the counter to 0, and then increment the counter by one in an endless loop. You should be able to control the speed at which the counter is incremented by using nested delay loops, along with the SW switches. If the SW switches are set to

11
their maximum value, 0b111111111, then the delay loops should cause the counter to increment slowly enough so that each change in the counter can be visually observed on the LEDs. Lowering the value of the SW switches should make the counter increment more quickly up to some maximum speed.

You can assemble your program by using the sbasm.py assembler, and then run it on your processor system from Part V. To do this, use the output produced by sbasm.py to overwrite the file inst_mem.mif in the folder that holds your Quartus project for Part V. To make use of the new inst_mem.mif file you do not need to completely recompile your Verilog code from Part V. Instead, execute the Quartus command Processing Update Memory Initialization File, to include the new inst_mem.mif file in your Quartus project. Next, select the Quartus command Processing Start Start Assembler to produce a new programming bitstream for your DE1-SoC board. Finally, use the Quartus Programmer to download the new bitstream onto your board. If the Run signal is asserted, your processor should execute the new program.

Part VII

Augment your assembly-language program from Part VI so that counter values are displayed on the seven-segment display port rather than on the LED port. You should display the counter values as decimal numbers from 0 to 65535. The speed of counting should be controllable using the SW switches in the same way as for Part VI. As part of your solution you may want to make use of the code shown in Figure 22. This code provides a subroutine that divides the number in register r0 by 10, returning the quotient in r1 and the remainder in r0. Dividing by 10 is a useful operation when performing binary-to-decimal conversion. The DIV10 subroutine assumes that r6 is set up to be used as a stack pointer. Register r2 is saved on the stack at the beginning of the subroutine, and then restored before returning. This is done so that r2 is not unnecessarily changed by the subroutine. A skeleton of the required code for this part is shown in Figure 23.

subroutine DIV10

This subroutine divides the number in r0 by 10

The algorithm subtracts 10 from r0 until r0 < 10, and keeps count in r1

This subroutine assumes that r6 can be used as a stack pointer

input: r0

returns: quotient Q in r1, remainder R in r0

DIV10:

sub
r6, #1
// save registers that are modified

st
r2, [r6]
// save on the stack

mv
r1, #0
// init Q
DLOOP:
mv
r2, #9
// check if r0 is < 10 yet

sub
r2, r0

bcc
#RETDIV
// if so, then return
INC:
add
r1, #1
// but if not, then increment Q

sub
r0, #10
// r0 -= 10

b
#DLOOP
// continue loop
RETDIV:

ld
r2, [r6]
// restore from the stack

add
r6, #1

add
r5, #1
// adjust the return address

mv
pc, r5
// return results

Figure 22: A subroutine that divides by 10

As described previously, assemble your program with sbasm.py, update your MIF file in the Quartus software, generate a new bitstream file by using the Quartus Assembler, and then download the new bitstream onto your DE1-SoC board to run your new program.

12

.define HEX_ADDRESS 0x2000

.define SW_ADDRESS
0x3000

.define STACK 256

// bottom of memory
// This program shows a decimal counter on the HEX displays

mv
r6,
#STACK
// stack pointer

mv
r5,
pc
// return address for subroutine

mv
pc,
#BLANK
// call subroutine to blank the HEX displays
MAIN:
mv
r0,
#0
// initialize counter
LOOP:
mvt
r1,
#HEX_ADDRESS
// point to HEX port

...

... use a
loop to extract and display each digit

...

Delay loop for controlling the rate at which the HEX displays are updated

...

... read from SW switches, and use a nested delay loop

...

add
r0, #1
// counter += 1

bcc
#LOOP
// continue until
counter overflows
mv
r5,
pc
//
return address
for
subroutine
mv
pc,
#BLANK
//
call subroutine to
blank the HEX displays

b#MAIN

subroutine DIV

...

... code not shown here

...

add
r5,
#1
//
adjust
the return address
mv
pc,
r5
//
return
results

subroutine BLANK

...

... code not shown here

...

add
r5,
#1

mv
pc,
r5
// return from subroutine
DATA:
.word
0b00111111
// ’0’

....

Figure 23: Skeleton code for displaying decimal digits.

13