$24
General Instructions: This assignment consists of 5 pages, 6 exercises, and is marked out of 100. For any question involving calculations you must provide your workings. Cor-rect final answers without workings will be marked as incorrect. You may collaborate with other students in the class in the sense of general strategies to solve the problems. But each assignment and the answers within are to be solely individual work and completed indepen-dently. Any plagiarism found will be taken seriously and may result in a mark of 0 on this assignment, removal from the course, or more serious consequences.
Submission Instructions: The answers to this assignment are to be submitted on OWL as a single PDF file. Ideally, the answers are to be typed. At the very least, clearly scanned copies (no photographs) of hand-written work. If the person correcting your assignment is unable to easily read or interpret your answer then it may be marked as incorrect without the possibility of remarking.
Useful Facts:
=1×109 1 = 8
1 ( )=1024
1
Exercise 1. [6 Marks] Define the processor-memory gap. How have computer architec-tures changed to combat this gap? What metric looks to be minimized by these changes? Be specific with this metric; of course minimizing one metric might also decrease another, higher-level metric.
Exercise 2. [8 Marks] Consider the following function fib, which computes a Fibonacci number of a given index. Discuss the instruction locality of the function. Is there good locality or bad locality? Is there spatial locality or temporal locality? Explain why. For ease of discussion, you may consider a particular execution of the function, say, fib(4).
int fib(int n) {
if (n < 2) {
return n;
}
int n1 = fib(n-1);
int n2 = fib(n-2);
return n1 + n2;
}
Exercise 3. [6 Marks] Discern a single formula for the average memory access time for a computer with three levels of cache. The hit rates of those levels are 90%, 85%, and 80%, respectively. The latency of main memory is 500 clock cycles. Leave anything in the AMAT formula which has not been given a specific value as a parameter.
2
Exercise 4. [30 Marks] The following tables summarizes the instructions present in some program (Table 1) as well as two different processors (Table 2).
ALU
2000
Load
1000
Store
600
Branch
400
Table 1: Program Instructions
1
2
ALU
3
1
Load
4
6
Store
3
3
Branch
2
2
Table 2: Processor Instruction Cycles
[5 Marks] Calculate the ideal CPI of this program running on Processor 1.
[5 Marks] Calculate the ideal CPI of this program running on Processor 2.
[5 Marks] Assuming that Processor 1 operates at 2.8 GHz and Processor 2 operates at 2.4 GHz calculate the CPU time for the program when run on each processor. On which processor is the program faster? (Hint: for easy calculations, 1 = 1 × 109 )
[5 Marks] In terms of the three factors calculating CPU time (instruction count, CPI, clock frequency) explain why the processor from part (c) is the faster one. How could a programmer change the program described in Table 1 so that it would run faster on the other processor?
[10 Marks] Calculate the CPU time of the program described in Table 1 running on Processor 1 (at 2.8 GHz) when also considering memory stall cycles (and thus have CPU
time calculated using CPIstall). Assume this processor has one level of cache, an instruction miss rate of 2%, a data miss rate of 5% and a miss penalty of 100 cycles.
3
Exercise 5. [30 Marks] Consider a 32-bit computer with a simplified memory hierarchy. This hierarchy contains a single cache and an unbounded backing memory. The cache has the following characteristics:
Direct-Mapped, Write-through, Write allocate.
Cache blocks are 16 words each.
The cache has 128 sets.
[4 Marks] Calculate the cache’s size in bytes.
[12 Marks] Consider the following code fragment in the C programming language to be run on the described computer. Assume that: program instructions are not stored in cache, arrays are cache-aligned (the beginning of the array aligns with the beginning of a cache line), ints are 32 bits, and all other variables are stored only in registers.
int N = 16384;
int A[N];
for (int i = 1; i < N; i += 2) {
A[i-1] = A[i];
}
Determine the following:
The number of cache misses.
The cache miss rate.
The type of cache misses which occur.
[14 Marks] Consider the following code fragment in the C programming language to be run on the described computer. In addition to the assumptions of part (b), assume that that the array B immediately follows the array A in memory.
int N = 16384;
int A[N];
int B[N];
for (int i = 0; i < N; ++i) {
A[i] = B[i];
}
Determine the following:
The number of cache misses.
The cache miss rate.
The type of cache misses which occur.
4
Exercise 6. [20 Marks] Below is a sequence of references to 16-bit memory word ad-dresses.
1, 18, 14, 16, 2, 3, 17, 18, 14, 15, 20, 21
Consider an initially empty, direct-mapped cache with 2-word cache lines and a capacity of 32 bytes. Using the sequence of address references above, determine if each address referenced results in a hit or a miss. If the reference results in a cache miss, say which type of cache miss (cold, conflict, capacity) You may use the table below to help answering this question, but for the purposes of marking, we do not care about the tag, index, or block offset.
Consider an initially empty, 2-way set associative cache with 2-word cache lines, a ca-pacity of 32 bytes, and an LRU (least recently used) eviction policy. Using the sequence of address references above, determine if each address referenced results in a hit or a miss. If the reference results in a cache miss, say which type of cache miss (cold, conflict, capacity) You may use the table below to help answering this question, but for the purposes of marking, we do not care about the tag, index, or block offset.
Address
Tag
Index
Block Offset
Hit/Miss
Type of Miss
Word
1
18
14
16
2
3
17
18
14
15
20
21
5