Starting from:
$35

$29

CENG THE3: Architecture Lab Solution

    • Introduction

In this lab, you will learn about the design and implementation of a pipelined Y86-64 processor, optimizing both it and a benchmark program to maximize performance. You are allowed to make any semantics-preserving transformation to the benchmark program, or to make enhancements to the pipelined processor, or both. When you have completed the lab, you will have a keen appreciation for the interactions between code and hardware that affect the performance of your programs.

The lab is organized into three parts, each with its own handin. In Part A you will write some simple Y86-64 programs and become familiar with the Y86-64 tools. In Part B, you will extend the SEQ simulator with a new instruction. These two parts will prepare you for Part C, the heart of the lab, where you will optimize the Y86-64 benchmark program and the processor design.


    • Logistics

You will work on this lab alone.

Any clarifications and revisions to the assignment will be posted on ODTUClass.


    • Handout Instructions

        1. Start by copying the file archlab-handout.tar to a (protected) directory in which you plan to do your work.

1





    2. Then give the command: tar xvf archlab-handout.tar. This will cause the following files to be

unpacked into the directory: README, sim.tar, archlab.pdf, and simguide.pdf.

    3. Next, give the command tar xvf sim.tar. This will create the directory sim, which contains your per-sonal copy of the Y86-64 tools. You will be doing all of your work inside this directory.

    4. Finally, change to the sim directory and build the Y86-64 tools:

unix>  cd sim

unix>  make clean && make

Note that this should work directly on the ineks, but you’ll need to do some extra work if you want to compile the lab on your own system. Check the final section, Installation & Usage Hints.

    • Part A

You will be working in directory sim/misc in this part.

Your task is to write and simulate the following three Y86-64 programs. The required behavior of these programs is defined by the example C functions in examples.c. Be sure to put your name and ID in a comment at the beginning of each program. You can test your programs by first assembling them with the program YAS and then running them with the instruction set simulator YIS.

In all of your Y86-64 functions, you should follow the x86-64 conventions for passing function arguments, using registers, and using the stack.

max bst.ys: Find the maximum of a binary search tree


Write a Y86-64 program max bst.ys to iteratively find the maximum of a binary search tree. Your program should consist of some code that sets up the stack structure, invokes a function, and then halts. In this case, the function should be Y86-64 code for a function (max bst) that is functionally equivalent to the C max bst function in Figure 1. Test your program using the following eleven-element binary search tree:


    • A sample eleven-element BST. Absolutely positioned

    • to avoid confusion when debugging.

.pos 0x200

root:

.quad 17

.quad node6

.quad node24

node6:

.quad 6

.quad node4

.quad node11

node4:

.quad 4

.quad node3

.quad node5

node3:

.quad 3


2





.quad 0

.quad 0

node5:

.quad 5

.quad 0 # Remember that 0 is null.

.quad 0

node11:

.quad 11

.quad node8

.quad 0

node8:

.quad 8

.quad 0

.quad 0

node24:

.quad 24

.quad node19

.quad node40

node19:

.quad 19

.quad 0

.quad 0

node40:

.quad 40

.quad 0

.quad node52

node52:

.quad 52

.quad 0

.quad 0

max btree.ys: Recursively find the maximum of a binary tree


Write a Y86-64 program max btree.ys that recursively finds the maximum of an arbitrary binary tree (not a binary search tree). This code will have to use recursion, as shown with the C function max btree in Figure 1. Test your program using the following 9-element binary tree.


    • A binary (not search!) tree,

    • absolutely positioned again.

.pos 0x200 root:

.quad 5

.quad node7

.quad node12 node7:

.quad 7

.quad node25

.quad node905 node25:

3





.quad 25

.quad 0

.quad 0

node905:

.quad 905

.quad nodem1

.quad 0

nodem1:

.quad -1

.quad 0

.quad 0

node12:

.quad 12

.quad node219

.quad nodem10

node219:

.quad 219

.quad 0

.quad 0

nodem10:

.quad -10

.quad 0

.quad node331

node331:

.quad 331

.quad 0

.quad 0

collect into.ys: Collect binary tree values into an array


Write a program collect into.ys that collects the values inside a binary tree in-order into an array, and returns the number of elements collected. Note that the function also takes the size of the array as an argument: If the tree contains more values than there are slots for, only enough values to fill the array will be collected, and the size of the array will be returned. Everything is fine if the array has more space than the binary tree.


Your program should consist of code that sets up a stack frame, invokes a function collect into, and then halts.


The function should be functionally equivalent to the C function collect into shown in Figure 1.

Test your program using the same BST as in max bst, and store values in the following empty array having just 8 slots:


    • An array with size of 8 to put elements in:

    • Make sure your code works correctly. Do not

    • fill beyond the bounds of the array. You should

    • see values in sorted order starting from the minimum

    • of the BST, since the traversal is in-order.

.pos 0x400

array:

.quad 0

.quad 0


4





.quad 0

.quad 0

.quad 0

.quad 0

.quad 0

.quad 0



    • Part B

You will be working in directory sim/seq in this part.

Your task in Part B is to extend the SEQ processor to support a new instruction, leaq, a ten-byte instruction having the form leaq C(rB), rA. This will be similar to, but weaker than its x86 64 counterpart: it will simply add the offset C to register rB, and store the result in register rA. Just like in x86 64, leaq should not set condition codes. There is no fancy array indexing, since the Y86-64 processor only has one arithmetic unit and cannot shift bits. The encoding of leaq is shown in Figure 2. The main takeaway is that you should be careful with the ordering of the registers. The result is stored in rA unlike most other instructions!


To add this instruction, you will modify the file seq-full.hcl, which implements the version of SEQ described in the CS:APP3e textbook. In addition, it contains declarations of some constants that you will need for your solution.

Your HCL file must begin with a header comment containing the following information:

    • Your name and ID.

    • A description of the computations required for the leaq instruction. Use the descriptions of irmovq and OPq in Figure 4.18 in the CS:APP3e text as a guide.

Building and Testing Your Solution

Once you have finished modifying the seq-full.hcl file, then you will need to build a new instance of the SEQ simulator (ssim) based on this HCL file, and then test it:

    • Building a new simulator. You can use make to build a new SEQ simulator: unix> make VERSION=full

This builds a version of ssim that uses the control logic you specified in seq-full.hcl. To save typing, you can assign VERSION=full in the Makefile.

    • Testing your solution on a simple Y86-64 program. For your initial testing, we recommend running simple test programs such as leamany.yo (testing leaq) in TTY mode, comparing the results against the ISA simulation:

unix>  ./ssim -t ../y86-code/leamany.yo

If the ISA test fails, then you should debug your implementation by single stepping the simulator in GUI mode: unix> ./ssim -g ../y86-code/leamany.yo

    • Retesting your solution using the benchmark programs. Once your simulator is able to correctly execute small programs, then you can automatically test it on the Y86-64 benchmark programs in ../y86-code:

5





1  struct btree {

    • long value;
    • struct btree *left, *right;
4  };

5
6  long max_bst(const struct btree *root)
    • {

    • long max = 0;

9if (root) {

    10 while (root->right)

11    root = root->right;

    12 max = root->value;

    13 }

    14 return max;

    15 }

16
17  long max_btree(const struct btree *root)
    18 {

    19 long max = 1L << 63;

    20 if (root) {

    21 long candidate;

    22 max = root->value;

    23 candidate = max_btree(root->left);

    24 if (candidate > max)

    25 max = candidate;

    26 candidate = max_btree(root->right);

    27 if (candidate > max)

    28 max = candidate;

    29 }

    30 return max;

    31 }

32
33  long collect_into(const struct btree *root, long *array, long array_len)
    34 {

    35 if (!root || array_len <= 0) {

    36 return 0;

    37 } else {

    38 long left_len, right_len;

    39 left_len = collect_into(root->left, array, array_len);

    40 if (left_len == array_len)

41    return left_len;

    42 array[left_len] = root->value;

    43 right_len = collect_into(root->right, array + left_len + 1,

44    array_len - left_len - 1);

    45 return left_len + 1 + right_len;

    46 }

    47 }


Figure 1: C versions of the Y86-64 solution functions. See sim/misc/examples.c


6














Figure 2: Encoding of the leaq instruction


unix>    (cd ../y86-code && make testssim)

This will run ssim on the benchmark programs and check for correctness by comparing the resulting processor state with the state from a high-level ISA simulation. Note that none of these programs test the added instruc-tions. You are simply making sure that your solution did not inject errors for the original instructions. See file

../y86-code/README file for more details.

    • Performing regression tests. Once you can execute the benchmark programs correctly, then you should run the extensive set of regression tests in ../ptest. To test everything except leaq:

unix> (cd ../ptest && make SIM=../seq/ssim) To test your implementation of leaq:

unix>  (cd ../ptest && make SIM=../seq/ssim TFLAGS=-l)

For more information on the SEQ simulator refer to the handout CS:APP3e Guide to Y86-64 Processor Simulators (simguide.pdf).


    • Part C

You will be working in directory sim/pipe in this part.

The absrev function in Figure 3 copies a len-element integer array src to a non-overlapping dst array in reverse, returning the sum of the absolute value of the numbers contained in src. Figure 4 shows the baseline Y86-64 version of absrev. The file pipe-full.hcl contains a copy of the HCL code for PIPE, along with constant declarations for instruction codes.

Your task in Part C is to modify absrev.ys and pipe-full.hcl with the goal of making absrev.ys run as fast as possible.

You will be handing in two files: pipe-full.hcl and absrev.ys. Each file should begin with a header comment with the following information:

    • Your name and ID.

    • A high-level description of your code. For absrev.ys, describe how and why you modified your code. A step by step approach is recommended for clarity, e.g. - I did X reducing my CPE from A to B, - I did Y reducing my CPE from B to C etc. For pipe-full.hcl, describe how and why you modified the control logic and how this was helpful in speeding up your code. Note that you can also choose to not modify pipe-full.hcl at all and keep your optimizations restricted to the code.



7




    • /*
    • * absrev - copy src to dst in reverse, returning the sum of the absolute

3   * value of numbers contained in src array.
    • */
5  word_t absrev(word_t *src, word_t *dst, word_t len)
    • {
    • word_t *dst_rev = dst + len - 1;

8word_t sum = 0;

9word_t val;

    10 word_t absval;

11

    12 while (len > 0) {
    13 val = *src++;
    14 *dst_rev-- = val;

    15 absval = val > 0 ? val : -val;

    16 sum += absval;

    17 len--;

    18 }

19

    20 return sum;

    21 }


Figure 3: C version of the absrev function. See sim/pipe/absrev.c.


To add more flexibility, the PIPE processor description provided in pipe-full.hcl comes extended with an extra ten-byte instruction icmpq C, rB, that compares register rB with constant value C. This is the same as subtracting C from rB, but the result is thrown away after being used to set the condition flags, and rB does not change.

Coding Rules

You are free to make any modifications you wish, with the following constraints:

    • Your absrev.ys function must work for arbitrary array sizes. You might be tempted to hardwire your solution for 64-element arrays by simply coding 64 copy instructions, but this would be a bad idea because we will be grading your solution based on its performance on arbitrary arrays.

    • Your absrev.ys function must run correctly with YIS. By correctly, we mean that it must correctly copy the src block in reverse and return (in %rax) the correct sum of absolute values.

    • The assembled version of your absrev file must not be more than 1000 bytes long. You can check the length of any program with the absrev function embedded using the provided script check-len.pl:

unix>  ./check-len.pl < absrev.yo

    • Your pipe-full.hcl implementation must pass the regression tests in ../y86-code and ../ptest (without the -l flag that tests leaq).

Other than that, you are free to implement the leaq instruction and apply changes to the control logic if you think that will help as long your pipe-full.hcl implementation passes the regression tests (i.e. the base Y86-64 instruction

8






    • ##################################################################

2 # absrev.ys - Reverse a src block of len words to dst.

3 # Return the sum of absolute values of words contained in src.

4 #

5 # Include your name and ID here.

6 # Describe how and why you modified the baseline code.

7 ##################################################################

8 # Do not modify this portion

9 # Function prologue.

10 # %rdi = src, %rsi = dst, %rdx = len

    11 absrev:

    12 ##################################################################

    13 # You can modify this portion

    14 # Loop header

15    xorq %rax,%rax    # sum = 0;

16

    17 # all this for dst_rev = dst + len - 1

18
xorq
%rcx,
%rcx
#
zero rcx
19
addq
%rdx,
%rcx
#
add len eight times

    20 addq %rdx, %rcx

    21 addq %rdx, %rcx

    22 addq %rdx, %rcx

    23 addq %rdx, %rcx

    24 addq %rdx, %rcx

    25 addq %rdx, %rcx

    26 addq %rdx, %rcx

27
irmovq $8, %r8
# for subtracting 8
28
addq %rsi, %rcx
# add dst
29
subq %r8, %rcx
# finally, rcx holds dst_rev
30


31
andq %rdx,%rdx
# len <= 0?
32
jle Done
# if so, goto Done:

    33 Loop:

    34 mrmovq (%rdi), %r10 # read val from src...

    35 rmmovq %r10, (%rcx) # ...and store it to dst

36
andq %r10,
%r10
# val >=
0?
37
jge Positive
# if so,
skip negating
38
rrmovq %r10, %r9
#
temporary move
39
xorq %r10,
%r10
#
zero r10

    40 subq %r9, %r10# negation achieved!

    41 Positive:

42    addq %r10, %rax    # sum += absval

    43 irmovq $1, %r10

44    subq %r10, %rdx    # len--

    45 irmovq $8, %r10

46
addq %r10, %rdi
# src++
47
subq
%r10, %rcx
# dst_rev--
48
andq
%rdx,%rdx
#
len > 0?
49
jg Loop
#
if so, goto Loop:

    50 ##################################################################

    51 # Do not modify the following section of code

    52 # Function epilogue.

    53 Done:

    54 ret

    55 ##################################################################

    56 # Keep the following label at the end of your function
    57 End:
9

Figure 4: Baseline Y86-64 version of the absrev function. See sim/pipe/absrev.ys.





set remains functional). Once again, you can also choose to not modify pipe-full.hcl at all with no grade penalty.

You may make any semantics preserving transformations to the absrev.ys function, such as reordering instructions, replacing groups of instructions with single instructions, deleting some instructions, and adding other instructions. You may find it useful to read about loop unrolling in Section 5.8 of CS:APP3e. You are allowed to add constant data (such as arrays, as you did in part A) to your program using the directives .align, .quad and .pos.

Building and Running Your Solution

In order to test your solution, you will need to build a driver program that calls your absrev function. We have provided you with the gen-driver.pl program that generates a driver program for arbitrary sized input arrays. For example, typing

unix>    make drivers

will construct the following two useful driver programs:

    • sdriver.yo: A small driver program that tests an absrev function on small arrays with 4 elements. If your solution is correct, then this program will halt with a value of 10 (0xa)in register %rax after copying the src array in reverse.

    • ldriver.yo: A large driver program that tests an absrev function on larger arrays with 63 elements. If your solution is correct, then this program will halt with a value of 2016 (0x7e0) in register %rax after copying the src array in reverse.

Each time you modify your absrev.ys program, you can rebuild the driver programs by typing

unix>    make drivers

Each time you modify your pipe-full.hcl file, you can rebuild the simulator by typing

unix>    make psim VERSION=full

If you want to rebuild the simulator and the driver programs, type

unix>    make VERSION=full

To test your solution in GUI mode on a small 4-element array, type

unix>    ./psim -g sdriver.yo

To test your solution on a larger 63-element array, type

unix>    ./psim -g ldriver.yo

Once your simulator correctly runs your version of absrev.ys on these two block lengths, you will want to perform the following additional tests:

10





• Testing your driver files on the ISA simulator. Make sure that your absrev.ys function works properly with

YIS:

unix>    make drivers

unix>    ../misc/yis sdriver.yo

    • Testing your code on a range of block lengths with the ISA simulator. The Perl script correctness.pl generates driver files with block lengths from 0 up to some limit (default 65), plus some larger sizes. It simulates them (by default with YIS), and checks the results. It generates a report showing the status for each block length:

unix>  ./correctness.pl

This script generates test programs where the result count varies randomly from one run to another, and so it provides a more stringent test than the standard drivers.

If you get incorrect results for some length K, you can generate a driver file for that length that includes checking code, and where the result varies randomly:

unix> ./gen-driver.pl -f absrev.ys -n K -rc > driver.ys unix> make driver.yo

unix>  ../misc/yis driver.yo

The program will end with register %rax having the following value:

0xaaaa : All tests pass.

0xbbbb : Incorrect sum.

0xcccc : Function absrev is more than 1000 bytes long.

0xdddd : Some of the source data was not copied to its destination.

0xeeee : Some word just before or just after the destination region was corrupted.

    • Testing your pipeline simulator on the benchmark programs. Once your simulator is able to correctly execute sdriver.ys and ldriver.ys, you should test it against the Y86-64 benchmark programs in ../y86-code:

unix>  (cd ../y86-code && make testpsim)

This will run psim on the benchmark programs and compare results with YIS.

    • Testing your pipeline simulator with extensive regression tests. Once you can execute the benchmark programs correctly, then you should check it with the regression tests in ../ptest. For example, if your solution implements the leaq instruction, then to test ileaq along with all the standard instructions:

unix>  (cd ../ptest && make SIM=../pipe/psim TFLAGS=-l)

    • Testing your code on a range of block lengths with the pipeline simulator. Finally, you can run the same code tests on the pipeline simulator that you did earlier with the ISA simulator

unix>  ./correctness.pl -p


    • Evaluation

The lab is worth 165 points: 45 points for Part A, 30 points for Part B, and 90 points for Part C.

Since this homework is more conventional than the Bomb and Attack Labs, your handins will be checked for plagia-rism, as per usual. Please remember that we have a zero tolerance policy for cheating. This includes any work that is not your own, including using sources from the internet.

11





Part A

Part A is worth 45 points, 15 points for each Y86-64 solution program. Each solution program will be evaluated for correctness, including proper handling of the stack and registers, as well as functional equivalence with the example C functions in examples.c.

The program max bst.ys will be considered correct if the graders do not spot any errors in them, memory is not corrupted and the function returns 0x34 in register %rax.


Similarly, max btree.ys will be considered correct if the graders do not spot any errors in them, memory is not corrupted and the function returns 0x389 in register %rax.


Finally, collect into.ys will be considered correct if the graders do not spot any errors in them, and the function returns 0x8 in %rax, correctly copies the least 8 values in the BST to the array in sorted order, and does not corrupt other memory locations or go over the bounds of the array.


Part B

This part of the lab is worth 30 points:

    • 10 points for your description of the computations required for the leaq instruction.

    • 5 points for passing the benchmark regression tests in y86-code, to verify that your simulator still correctly executes the benchmark suite.

    • 15 points for passing the regression tests in ptest for leaq.

Part C

This part of the Lab is worth 90 points: You will not receive any credit if either your code for absrev.ys or your modified simulator fails any of the tests described earlier.

    • 15 points each for your descriptions in the headers of absrev.ys and pipe-full.hcl and the quality of these implementations. To be extra clear, again, you do not have to modify pipe-full.hcl to get a full grade, your absrev.ys will be graded out of 30 if you do not. However, in case you do modify it explanations must be present.

    • 60 points for performance. To receive credit here, your solution must be correct, as defined earlier. That is, absrev runs correctly with YIS (unless you modify instructions, in which case running on psim is enough) , and pipe-full.hcl passes all tests in y86-code and ptest (remember that leaq and the already existing icmpq will not be tested during grading).

We will express the performance of your function in units of cycles per element (CPE). That is, if the simulated code requires C cycles to copy a block of N elements, then the CPE is C=N. The PIPE simulator displays the total number of cycles required to complete the program. The baseline version of the absrev function running on the standard PIPE simulator with a large 63-element array requires 1009 cycles to copy 63 elements, for a CPE of 1009=63 = 16:02.

Since some cycles are used to set up the call to absrev and to set up the loop within absrev, you will find that you will get different values of the CPE for different block lengths (generally the CPE will drop as N increases). We will therefore evaluate the performance of your function by computing the average of the CPEs for blocks ranging from 1 to 64 elements. You can use the Perl script benchmark.pl in the pipe directory to run simulations of your absrev.ys code over a range of block lengths and compute the average CPE. Simply run the command

12





unix>    ./benchmark.pl

to see what happens. For example, the baseline version of the absrev function has CPE values ranging between 48:00 and 15:97, with an average of 17:83. Note that this Perl script does not check for the correctness of the answer. Use the script correctness.pl for this.

You should be able to achieve an average CPE of less than 11:00. If your average CPE is c, then your score S for this portion of the lab will be:
S  =
8
20  (12:5
c) ;  9:50   c
12:50

<
0 ;
c > 12:5



60 ;
c < 9:50

:



By default, benchmark.pl and correctness.pl compile and test absrev.ys. Use the -f argument to specify a different file name. The -h flag gives a complete list of the command line arguments.


    • Bonus Opportunities

Since the homework is not exactly trivial, there are three opportunities to get extra points from the homework for those who get really into it, none of them being mutually exclusive. Each will grant you one extra point, up to a total of three. Since the homework constitutes six points of the course grade, you can go up to nine with the bonuses: possibly 150% of the maximum grade. Essentially, each bonus is equivalent to 27.5 points from the lab: one sixth extra, up to a total of half.

Performance

If your code achieves an average CPE below the maximum grade threshold 9.50, i.e. benchmark.pl shows an average CPE 9:49, and you have properly explained how you’ve achieved this performance, you get one extra point.

Performance++

If your code further achieves an average CPE below 9.40, i.e. benchmark.pl shows an average CPE 9:39, and the modifications that led to this are explained clearly, you get one more point!

Certified (!) Y86-64 Professional

If you can achieve the Performance++ bonus without modifying the behavior of the untested instructions leaq and icmpq, you get yet another point for being cool. Make sure to add a leaq implementation to your pipe-full.hcl for this.

This means that your pipe-full.hcl implementation has to pass the following tests on top of having an average

CPE less than or equal to 9.39 using your absrev.ys:

unix>    (cd ../ptest && make SIM=../pipe/psim ’TFLAGS=-l -c’)





13





    • Tips and Tricks

Part A

    • The examples.c file shown in the PDF is stripped of all comments to fit in the PDF. Check the examples.c file under misc to see the real, commented version, if you want to understand what is going on.

    • You do not necessarily have to think about the algorithms, simply reproducing the C versions of the given functions using Y86-64 is enough. Of course, you are free to write your own if you want to challenge yourself. This is fine as long as your functions behave in the expected way, as explained in the evaluation section.

    • Be careful with the placement of the absolutely positioned data, and make sure to place the stack far enough from your code since it grows downwards (towards zero). The simulator will not think twice about overwriting your code if the stack grows too large, which might be hard to debug. An example layout that should work is shown in Figure 5.

You can check examples under the y86-code directory (such as asum.ys) if you want to see how the initial set-up code is written. Remember that having a main function is optional.

    • Examine the value of %rax and the Changes to memory section from the output of the ISA simulator YIS to make sure that your functions work.

    • In the CS:APP3e book, Figure 4.1 (around page 383) shows the Y86-64 registers while Figure 4.2 (around page 385) shows the Y86-64 instruction set.

Part B

    • You do not have full control over the circuit design of the processor, instead, you can modify the existing control logic, which makes your job simpler. This part is much easier than it seems initially!

    • Figure 4.23 (around page 427) in the CS:APP3e textbook illustrates the design of SEQ, which might be helpful.


Part C

    • Even though your code needs to work for all block sizes, the benchmark is the average CPE for block sizes from 1 to 64 only. Larger sizes are the majority!

    • Be careful with leaq and icmpq! Even though adding values to registers and comparing with values directly is very convenient, each of these instructions are 10 bytes (due to the value being stored in the instruction) and might cause you to go through your 1000 byte program size limit rather quickly.

    • Remember that PIPE does not re-order instructions. You have to consider possible hazards that may delay the pipeline using your own knowledge. Think hard about the program and do your best to write correct code that is as fast as possible.

    • pipe-full.hcl may seem impenetrable at first. It is not! There are different modifications you can perform that could help with performance, depending on the structure of your program. As a bonus, tinkering with pipe-full.hcl will help you understand PIPE much better. This knowledge may be useful in the written exams. However, you can still choose not to modify it at all with no extra penalty.

    • You cannot add new instructions to the ISA. However, since icmpq and leaq will not be tested (unless you’re going for the final bonus), you can change pipe-full.hcl to make icmpq or leaq do something else entirely, if you have an idea that would help performance. Obviously the execution of your program will not match the ISA simulator YIS’s execution in this case for programs containing icmpq or leaq, and you should

14















    • .pos 0

    • # initial code for setting

    • # up the stack and calling main or your function

4# and stopping after your function returns

5

6  # the example data, starting at

7  # byte 512 to be far enough from

8  # your initial code to not have problems

9  .pos 0x200

    10 # .. data ..

    11 # .. data ..

    12 # .. data ..

13

    14 main:

    15 # Optionally, you can have a main function

    16 # setting up the arguments to your function

    17 # and calling it, but it’s optional. Feel

    18 # free to call func directly from the initial code.

19

    20 func:

    21 # code for your function...

    22 # .. code ..

    23 # .. code ..

    24 # .. code ..

    25 # .. code ..

26

27  # stack starting at byte 2048,

28  # far away from the code, your code

29  # should not be long enough to get here anyway!

30  .pos 0x800

    31 stack:


Figure 5: An example layout for the functions in part A. Check y86-code/asum.ys for an example.











15





perform the regression tests without the -l and -c flags. But this is fine since they will not be tested during grading. You definitely do not need to do this kind of thing to achieve maximum performance (below 9.40 CPE), but it makes it significantly easier. This is why the ultimate bonus is for not doing this and still achieving great performance! Make sure to always explain any changes you make in the comments though.

• Figure 4.52 (around page 468) in the CS:APP3e textbook illustrates the design of PIPE, which will be helpful.


    10 Handin Instructions

        ◦ You will submit your solutions as a single compressed archive file named eXXXXXXX.tar.gz to ODTU-Class, where XXXXXXX is your 7-digit student ID. Please name your file correctly. Remember that you can create .tar.gz (gzipped tarball) files as follows:

unix>    tar -czf eXXXXXXX.tar.gz <files>

    • Your archive should contain three sets of files (for a total of six):

– Part A: max bst.ys, max btree.ys, and collect into.ys.

– Part B: seq-full.hcl.

– Part C: absrev.ys and pipe-full.hcl.


These files should all be directly under the archive; your archive should not contain any directories.

• Make sure you have included your name and ID in a comment at the top of each of your handin files.


    11 Installation & Usage Hints

        ◦ Experimental syntax highlighting files are provided for vim under vim-y86-highlighting.tar.gz. Extract this into your /.vim folder and it should work directly. I recommend adapting this file to your own favorite editor to increase the amount of fun you have while doing the homework. Writing Y86-64 as plain text is plain suffering. This will also work on the ineks, of course.

        ◦ By design, both sdriver.yo and ldriver.yo are small enough to debug with in GUI mode. We find it easiest to debug in GUI mode, and suggest that you use it.

        ◦ In order to compile the simulator with GUI mode enabled, Tcl/Tk libraries are necessary. The TKLIBS and TKINC variables in the Makefiles are configured for 64-bit Linux and Tcl/Tk8.6, with the ineks in mind. Thus:

– If you want to compile without GUI support, comment the GUIMODE, TKLIBS and TKINC variables out in the Makefiles under sim, sim/seq and sim/pipe.

– If you have a 64-bit Linux system running Ubuntu, the following package installation commands should set you up:

ubuntu>    sudo apt update

ubuntu>    sudo apt install flex bison tcl-dev tk-dev

– If you’re on Mac (some of you probably are!), try out the experimental instructions in macOS-compilation-instructions.pdf.

– Otherwise, or if these do not work, you should still be able to use the GUI remotely through the ineks. Read on.

16





    • It is possible to connect to the ineks remotely and use the GUI of the simulator. First, connect to the login server (which allows X11 forwarding for now, unlike external):

unix>    ssh -X -p 8085 eXXXXXXX@login.ceng.metu.edu.tr

And then connect to an inek by using the -X parameter again:

unix>    ssh -X inek42

Afterwards you should be able to run the simulator in GUI mode over the connection. Since drawing commands are sent over the network with X11 forwarding, the GUI will take more time to get initialized than on your local machine.

    • X11 Forwarding should work by default on Linux. For Mac and Windows, you will need to install an X Server. Examples that should work:

– If you’re using a Mac, install XQuartz, restart your computer (or logout/in) and you should be good to go (xQuartz should start running in the background on its own, or you can make it run). Make sure to check that your SSH configuration allows X11 forwarding if it does not work.

– For Windows, assuming you already have PuTTY, you again need to install an X server like Xming. Once this is done, make sure that Xming is running in the background. Afterwards, enable X11 forwarding when connecting from PuTTY through Connection -¿ SSH -¿ X11. That should do it.

    • With some X servers, the “Program Code” window begins life as a closed icon when you run psim or ssim in GUI mode. Simply click on the icon to expand the window.

    • With some Microsoft Windows-based X servers, the “Memory Contents” window will not automatically resize itself. You’ll need to resize the window by hand.

    • The psim and ssim simulators terminate with a segmentation fault if you ask them to execute a file that is not a valid Y86-64 object file.



























17

More products