$24
2.8 (10%)
Translate the following RISC-V code to C. Assume that the variables f, g, h, i, and j are assigned to registers x5, x6, x7, x28, and x29, respectively. Assume that the base address of the arrays A and B are in registers x10 and x11, respectively.
addi x30, x10, 8
addi x31, x10, 0
sd x31, 0(x30)
ld x30, 0(x30)
add x5, x30, x31
2.9 (10%)
For each RISC-V instruction in Exercise 2.8, show the value of the opcode (op), source register (rs1), and destination register (rd) fields. For the I-type instructions, show the value of the immediate field, and for the R-type instructions, show the value of the second source register (rs2). For non U- and UJ-type instructions, show the funct3 field, and for R-type and S-type instructions, also show the funct7 field.
2.16 (15%)
Assume that we would like to expand the RISC-V register file to 128 registers and expand the instruction set to contain four times as many instructions.
2.16.1 (5%)
How would this affect the size of each of the bit fields in the R-type instructions?
2.16.2 (5%)
How would this affect the size of each of the bit fields in the I-type instructions?
2.16.3 (5%)
How could each of the two proposed changes decrease the size of a RISC-V assembly program? On the other hand, how could the proposed change increase the size of an RISC-V assembly program?
2 Programming (70%)
We will test the following problems on RISC-V software stack. The packages we use are spike, proxy kernel with newlib and gcc. And the riscv-isa we will use to test the program is RV64IMAFDC.
1
Before we start programming, we will use docker to set up our environment (Refer to the supplementary.pdf to see how to install docker and use it).
docker pull ntuca2020/hw2 // (4G)
docker run --name=test -it ntuca2020/hw2
cd /root
ls
The folder structure in the docker image looks like the following:
/root
|-- Examples
• |-- Example1
• |-- Example2
• ‘-- Example3 ‘-- Problems
|-- fibonacci
| |-- Makefile
| |-- fibonacci.c | ‘-- fibonacci.s |-- convert
| |-- Makefile | |-- convert.c | ‘-- convert.s ‘-- matrix
|-- Makefile |-- matrix.c ‘-- matrix.s
• inline assembly test
• link with .s file test
• setup debug environment
• fibonacci number
• string to int
• matrix multiplcation
make and make test to try it out.
You only need to submit *.s file of each problem.
Fibonacci (20%)
Implement Fibonacci number in assembly. (F0 = 0; F1 = 1, output Fn, no overflow)
unsigned long long int fibonacci(int);
Input: Output:
70 190392490709135
Convert (20%)
• Convert an ASCII string containing a positive or negative integer decimal string to an integer. Input length is at most 15 bytes.
• ‘+’ and ‘-’ will appear optionally. And once they appear, they will only appear once in the first byte.
• If a non-digit character appears anywhere in the string, your program should stop and return -1.
• The return value will be printed out in 32bit-int format.
int convert(char *);
Input:
Output:
+123
123
+0000000123
123
-123
-123
-0000000321
-321
2147483647
2147483647
2147483648
-2147483648
-2147483648
-2147483648
-123123AAA
-1
+123123AAA
-1
123123AAA
-1
2
Matrix multiplication (15%)
Do matrix multiplication of size 128x128 with some additional operations.
for (int i = 0; i < SIZE; i++)
for (int j = 0; j < SIZE; j++)
for (int k = 0; k < SIZE; k++)
C[i][j] = (C[i][j] + A[i][k] * B[k][j]) % 1024; Elements in A and B are unsigned 16 bits numbers of range [0,1023]
We will score based on the cycle counts. You can use C as an initial attempt.
asm volatile ("rdcycle \%0" : "=r" (start));
// matrix multiplication
asm volatile ("rdcycle \%0" : "=r" (end));
Grading:
• Below 20,000,000 cycles (2%)
• Below 18,000,000 cycles (2%)
• Below 16,000,000 cycles (2%)
• Below 14,000,000 cycles (2%)
• Below 12,000,000 cycles (2%)
• Below 10,000,000 cycles (1%)
• Below 9,000,000 cycles (1%)
• Below 8,000,000 cycles (1%)
• Below 7,000,000 cycles (1%)
• Below 6,000,000 cycles (1%)
Report on matrix multiplication (15%)
• Briefly explain how you get below 6,000,000 cycles.
• Or you can answer the following questions:
– How many cycles does it take by just doing the naive matrix multiplication?
– How many load and store does it need (roughly) during the whole computation? (Considering the registers it use)
– Is there any way to keep registers being used as much as possible before they’re replaced? (Hint: blocking)
– How many loop controls does it need (roughly) during the whole computation?
Submission
• All *.s file should be written assembly.
• Zip and upload your file to NTU cool in the following format, the file name should be 〈Student ID〉.zip:
b08922000 <-- zip this folder
|-- fibonacci.s
|-- convert.s
|-- matrix.s
‘-- report.pdf // including handwritten part and report on matrix multiplication part
• If there’s any question, please send email to 110fall.ca@gmail.com.
3