Starting from:
$30

$24

Homework Assignment 2 Solution

2.8 (10%)

Translate the following RISC-V code to C. Assume that the variables f, g, h, i, and j are assigned to registers x5, x6, x7, x28, and x29, respectively. Assume that the base address of the arrays A and B are in registers x10 and x11, respectively.
addi x30, x10, 8

addi x31, x10, 0

sd x31, 0(x30)

ld x30, 0(x30)

add x5, x30, x31

2.9 (10%)

For each RISC-V instruction in Exercise 2.8, show the value of the opcode (op), source register (rs1), and destination register (rd) fields. For the I-type instructions, show the value of the immediate field, and for the R-type instructions, show the value of the second source register (rs2). For non U- and UJ-type instructions, show the funct3 field, and for R-type and S-type instructions, also show the funct7 field.

2.16 (15%)

Assume that we would like to expand the RISC-V register file to 128 registers and expand the instruction set to contain four times as many instructions.

2.16.1 (5%)

How would this affect the size of each of the bit fields in the R-type instructions?

2.16.2 (5%)

How would this affect the size of each of the bit fields in the I-type instructions?

2.16.3 (5%)

How could each of the two proposed changes decrease the size of a RISC-V assembly program? On the other hand, how could the proposed change increase the size of an RISC-V assembly program?


2    Programming (70%)

We will test the following problems on RISC-V software stack. The packages we use are spike, proxy kernel with newlib and gcc. And the riscv-isa we will use to test the program is RV64IMAFDC.



















1
Before we start programming, we will use docker to set up our environment (Refer to the supplementary.pdf to see how to install docker and use it).

docker pull ntuca2020/hw2 // (4G)

docker run --name=test -it ntuca2020/hw2

cd /root

ls

The folder structure in the docker image looks like the following:

/root

|-- Examples

    • |-- Example1

    • |-- Example2

    • ‘-- Example3 ‘-- Problems

|-- fibonacci

|   |-- Makefile

| |-- fibonacci.c | ‘-- fibonacci.s |-- convert

| |-- Makefile | |-- convert.c | ‘-- convert.s ‘-- matrix

|-- Makefile |-- matrix.c ‘-- matrix.s



    • inline assembly test

    • link with .s file test

    • setup debug environment

    • fibonacci number



    • string to int



    • matrix multiplcation

make and make test to try it out.

You only need to submit *.s file of each problem.

Fibonacci (20%)

Implement Fibonacci number in assembly. (F0 = 0; F1 = 1, output Fn, no overflow)

unsigned long long int fibonacci(int);

Input:    Output:

70    190392490709135

Convert (20%)

    • Convert an ASCII string containing a positive or negative integer decimal string to an integer. Input length is at most 15 bytes.

    • ‘+’ and ‘-’ will appear optionally. And once they appear, they will only appear once in the first byte.

    • If a non-digit character appears anywhere in the string, your program should stop and return -1.

    • The return value will be printed out in 32bit-int format.


int convert(char *);
Input:
Output:
+123
123
+0000000123
123
-123
-123
-0000000321
-321
2147483647
2147483647
2147483648
-2147483648
-2147483648
-2147483648
-123123AAA
-1
+123123AAA
-1
123123AAA
-1


2
Matrix multiplication (15%)

Do matrix multiplication of size 128x128 with some additional operations.

for (int i = 0; i < SIZE; i++)

for (int j = 0; j < SIZE; j++)

for (int k = 0; k < SIZE; k++)

C[i][j] = (C[i][j] + A[i][k] * B[k][j]) % 1024; Elements in A and B are unsigned 16 bits numbers of range [0,1023]

We will score based on the cycle counts. You can use C as an initial attempt.

asm volatile ("rdcycle \%0" : "=r" (start));

// matrix multiplication

asm volatile ("rdcycle \%0" : "=r" (end));

Grading:

    • Below 20,000,000 cycles (2%)

    • Below 18,000,000 cycles (2%)

    • Below 16,000,000 cycles (2%)

    • Below 14,000,000 cycles (2%)

    • Below 12,000,000 cycles (2%)

    • Below 10,000,000 cycles (1%)

    • Below 9,000,000 cycles (1%)

    • Below 8,000,000 cycles (1%)

    • Below 7,000,000 cycles (1%)

    • Below 6,000,000 cycles (1%)

Report on matrix multiplication (15%)

    • Briefly explain how you get below 6,000,000 cycles.

    • Or you can answer the following questions:

– How many cycles does it take by just doing the naive matrix multiplication?

– How many load and store does it need (roughly) during the whole computation? (Considering the registers it use)

– Is there any way to keep registers being used as much as possible before they’re replaced? (Hint: blocking)

– How many loop controls does it need (roughly) during the whole computation?



Submission

    • All *.s file should be written assembly.

    • Zip and upload your file to NTU cool in the following format, the file name should be ⟨Student ID⟩.zip:

b08922000    <-- zip this folder

|-- fibonacci.s

|-- convert.s

|-- matrix.s

‘-- report.pdf    // including handwritten part and report on matrix multiplication part

• If there’s any question, please send email to 110fall.ca@gmail.com.


3

More products