$29
Purpose: In this lab you will study the effect of various cache design parameters. The first part includes problem solving and writing a program. The second part, the lab part, involves execution of the program (with possible extensions etc. if suggested by your TA) and preparing a report.
In your solutions and report make sure that you have proper tables, numbering etc. All tables must have subtitle and table number furthermore columns must have column names, etc. In your report make sure that you have a nice presentation. Make sure that you number the pages. Try to do everything before coming to the lab but make sure that you demonstrate your work to your TA.
Summary
Part 1 (50 points): Problem solving and writing a matrix addition program with a simple user interface. Part 2 (50 points): Experimental observations using a program that finds the summation of the elements of a matrix with different cache conditions..
TRY TO FINISH PART 2 BEFORE COMING TO THE LAB. MAKE SURE THAT YOU DO THE DEMO TO TA.
DUE DATE/TIME OF PART 1 --SAME FOR ALL SECTIONS
a. Please drop your written Preliminary Design Report into the box provided in front of the lab by 13:40 on Tuesday April 24. No late submissions will be accepted! Your paper submission for preliminary work must match MOSS submission.
• Note that April 23 Monday is a holiday and there is no lab, April 30 Monday there are no classes and there is no lab. The May 7 week is planned for the I/O lab. This explains why we have a lab on May 14.
1
Note that preliminary work involves problem solving and a program. On paper submit both the solutions of the problems and the program. Please upload your Preliminary Problem Solution Part and the Program Part separately to the Unilica Assignment by 13:40 on Tuesday April 24.
Use filename name_surname_SecNo_ Lab6_PRELIMproblem.doc [A DOC FILE as its extension suggests, which contains all the work including the program for the Preliminary Part].
For the program part Use filename name_surname_SecNo_PRELIMcode.txt [A NOTEPAD FILE as its extension suggests, which contains only the program part
DUE TIME OF PART 2—DIFFERENT FOR EACH SECTION:
a. You have to demonstrate your Part 2 lab work to the TA for grade by 12:15 in the morning lab and by 17:15 in the afternoon lab. Your TAs may give further instructions on this. If you wait idly and show your work last minute your TA may not accept it. Make sure that you follow your TAs' instructions.
b. At the conclusion of the demo for getting your grade, you will upload your lab work to the Unilica Assignment, for similarity testing by MOSS. Please see the related section below for further instructions on MOSS submission.
c. Use filename name_surname_SecNo_ Lab6_report.doc [A DOC FILE as its extension suggests, which contains all the work done for the Lab Experiment Report Part].
For the program part Use filename name_surname_SecNo_ Lab6_code.txt [A NOTEPAD FILE as its extension suggests, which contains the Experiment Code Part]: this part may involve deviation from the preliminary work program submitted as suggested by your TA.
Part 1. Preliminary Work / Preliminary Design Report (50 points)
You have to provide a neat presentation prepared by Word or a word processor with similar output quality. Handwritten answers will not be accepted. At the top of the paper on left provide the following information and staple all papers. Please make sure that this info is there for proper grading of your work, otherwise some points will be taken off.
CS224
Section No.: ...
Spring 2018
Lab No.:
Your Full Name/Bilkent ID:
2
Solve the problems of this part and bring your solutions to the lab. Number the pages, staple your submission and use a word processor for its preparation.
1. ( 5 points: With 2 or more errors you get 0 points. Otherwise full point.) Fill in the empty cells of the following table. Assume that main memory size is 4GB. Index Size: No. of bits needed to express the set number in an address, Block Offset: No. of bits needed to indicate the word offset in a block, Byte Offset: No. of bits needed to indicate the byte offset in a word. Block Replacement Policy Needed: Indicate if a block replacement policy such as FIFO, LRU etc. is needed (yes) or not (no). If some combinations are not possible mark them.
Block
No.
Tag
Index
Word
Byte
Block
Cache
N
of
Size
Size
Block
Offset
Replacement
Word
size
No.
Size
way
Sets
in
(Set
Offset
Size in
Policy Needed
Size
(no. of
KB
cache
bits
No.) in
Size in
bits2
(Yes/No)
words)
bits
bits1
1
256
1
32
4
bits
2
256
2
32
4
bits
3
256
4
32
8
bits
4
256
Full
32
8
bits
9
512
1
16
4
bits
10
512
2
16
4
bits
11
512
4
16
16
bits
12
512
Full
16
16
bits
1 Word Block Offset Size in bits: Log2(No. of words in a block)
2 Byte Offset Size in bits: Log2(No. of bytes in a word)
2. (5 points: With 2 or more errors you get 0 points. Otherwise full point.) Consider the following memory configuration: Assume that main memory size is 4GB. N= 1 (direct mapped), Block size= 8 bytes, No. of sets= 4.
Consider the following memory accesses and indicate which set is selected and indicate if it is a hit or miss.
Memory
Set
Hit
Address Accessed (hex)
No.
(yes/no)
00 00 00 28
00 00 00 49
00 00 00 6C
00 00 00 0C
00 00 00 0B
00 00 00 0D
3
3. (5 points: With 2 or more errors you get 0 points. Otherwise full point.) Consider the following memory configuration: The same as Section 2 above only difference is N= 2. Block replacement policy is FIFO.
Consider the following memory accesses and indicate which set is selected and indicate if it is a hit or miss. (With 2 or more errors you get 0 points. Otherwise full point.)
Memory
Set
Hit
Address Accessed (hex)
No.
(yes/no)
00 00 00 28
00 00 00 49
00 00 00 4C
00 00 00 0C
00 00 00 0B
00 00 00 0D
4. (5 points, With 1 or more errors you get 0 points. Otherwise full point.) Consider a three level memory: L1 and L2 are for cache memory and the third level is for the main memory. Access time for L1 is 1 clock cycle, the access time for L2 is two times more than L1 and main memory access time is 10 times more than L2. The miss rate for L1 is 20% and the miss rate for L2 is 5%. What is the effective clock cycle for memory access (AMAT in number of clock cycles)?
With 2 GHz clock rate how much time is needed for a program with 1012 instructions to execute?
5. (30 points) Write a program to find the summation of the elements of a square matrix. Provide a user interface for user interaction to demonstrate that your program is working properly. Assume that in the main memory matrix elements are placed row by row. Create an array for the matrix elements and initialize them row by row with consecutive values. For example a 3 by 3 (N= 3) matrix would have the following values.
1
2
3
4
5
6
7
8
9
The row by row placement means that you will have the values of the above 3 x 3 matrix are stored as follows in the memory.
Matrix Index
(1, 1)
(1, 2)
(1, 3)
(2, 1)
(2, 2)
(2, 3)
(3, 1)
(3, 2)
(3, 3)
(Row No., Col. No.)
Displacement
With respect the beginning
0
4
8
12
16
20
24
28
32
of the array containing the matrix
Value stored
1
2
3
4
5
6
7
8
9
4
In this configuration accessing the matrix element (i, j) simply involves computation of its displacement from the beginning of the array that stores the matrix elements. For example, the displacement of the matrix element with the index (i, j) with respect to the beginning of the array is (i - 1) x N x 4 + (j - 1) x 4, for a matrix of size N x N.
Your user interface must provide at least the following functionalities,
1. Ask the user the matrix size in terms of its dimensions (N),
2. Allocate an array with proper size using syscall code 9,
3. Ask user the matrix element to be accessed and display the content,
4. Obtain summation of matrix elements row-major (row by row) summation,
5. Obtain summation of matrix elements column-major (column by column) summation,
6. Display desired elements of the matrix by specifying its row and column member
2. [50 pts] Experiments with Data Cache Parameters
Run your program with two reasonably large different matrix sizes that would provide meaningful observations.
Report for Matrix Size 1: 25 Points
Report for Matrix Size 2: 25 Points
As described above make sure that you have a easy to follow presentation with numbered tables having proper heading etc.
Make sure that you find the summation of matrix elements by performing row-major and column-major addition. Note that the row-major addition is a simple array addition from the beginning to the end; however, the column-major addition is somewhat tricky.
a) Direct Mapped Caches: For the matrix sizes you have chosen, conduct tests with various cache sizes and block sizes, to determine the hit rate, miss rate and number of misses. Use at least 5 different cache sizes and 5 different block sizes (make sure your values are reasonable) in order to obtain curves like those of Figure 8.18 in the textbook. Make a 5 x 5 table with your values, with miss rate and # of misses as the data at each row-column location. Make a graph of miss rate versus block size, parameterized by cache size, like Figure 8.18.
b) Fully Associative Caches: Pick 3 of your parameter points obtained in part for column-major addition a), one with good hit rate, one with medium hit rate, and one with poor hit rate. For these 3 results, there were 3 configuration pairs of cache size and block size that resulted in the data. Take the same 3 configuration pairs, but this time run the simulation with a fully associate cache, using LRU replacement policy. Compare the results obtained: the Direct Mapped good result versus the Fully Associative good result, the Direct Mapped medium result versus the Fully Associative medium result, and the Direct Mapped poor result versus the Fully Associative poor result. How much difference did the change to fully associative architecture make? Now change the replacement policy to Random replacement, and run the 3 tests again (using the same 3 configuration pairs). Does replacement policy make a significant difference? Record these 9 values
5
in a new table, with 3 lines: for Direct Mapped, for Fully Associative-LRU and for Fully Associative-Random.
c) N-way Set Associative Caches: to save on hardware costs, fully set-associative caches are rarely used. Instead, most of the benefit can be obtained with an N-way set associative cache. Pick the medium hit rate configuration that you found in a) and used again in b), and change the architecture to N-way set associative. For several different set sizes (at least 4) and LRU replacement policy, run the program and record the hit rate, miss rate and number of misses. What set size gives the best result? How much improvement is gained as N (the number of blocks in a set) increases each step? Now repeat the tests, but for the good hit rate configuration from a) and b). Record these data and answer the same question again. Finally, repeat for the poor hit rate configuration.
Oral Interview with TA and Submission of your Data
Get ready for the interview with your TA, by gathering and analyzing your data, having it ready to submit in a clean understandable format. Be sure that you understand what you have done, and can interpret your data to the TA. Then call him over, and answer the questions he asks you.
Part 3. Submit Experiment Report and Your Code for MOSS similarity testing
Submit your MIPS codes for similarity testing to the Unilica > Assignment specific for your section.
As described above you have two files to upload.
Use filename name_surname_SecNo_ Lab6_report.doc [A DOC FILE as its extension suggests, which contains all the work done for the Lab Experiment Report Part].
For the program part Use filename name_surname_SecNo_ Lab6_code.txt [A NOTEPAD FILE as its extension suggests, which contains the Experiment Code Part]: this part may involve deviation from the preliminary work program submitted as suggested by your TA.
Your codes will be compared against all the other codes in the class, by the MOSS program, to determine how similar it is (as an indication of plagiarism). So be sure that the code you submit is code that you actually wrote yourself ! All students must upload their code to Unilica > Assignment while the TA watches. Submissions made without the TA observing will be deleted, resulting in a lab score of 0. The same type of comparison is also planned for the reports.
Part 4. Cleanup
a) After saving any files that you might want to have in the future to your own storage device, erase all the files you created from the computer in the lab.
b) When applicable put back all the hardware, boards, wires, tools, etc where they came from.
c) Clean up your lab desk, to leave it completely clean and ready for the next group who will come.
---------------------------------------------------------------------------------------------------------------------------
6
LAB POLICIES
1. You can do the lab only in your section. Missing your section time and doing in another day is not allowed.
2. Students will earn their own individual lab grade. The questions asked by the TA will have an effect on your individual lab score.
3. Lab score will be reduced to 0 if the code is not submitted for similarity testing, or if it is plagiarized. MOSS-testing will be done, to determine similarity rates. Trivial changes to code will not hide plagiarism from MOSS—the algorithm is quite sophisticated and powerful. Please also note that obviously you should not use any program available on the web, or in a book, etc. since MOSS will find it. The use of the ideas we discussed in the classroom is not a problem.
4. You must be in lab, working on the lab, from the time lab starts until your work is finished and you leave.
5. No cell phone usage during lab.
6. Internet usage is permitted only to lab-related technical sites.
7. For labs that involve hardware for design you will always use the same board provided to you by the lab engineer.
7