$24
Consider a program that can execute with no stalls and a CPI of 1 if the underlying processor can somehow magically service every load instruction with a 1-cycle L1 cache hit. In practice, 5% of all load instructions suffer from an L1 cache miss, 2% of all load instructions suffer from an L2 cache miss, and 1% of all load instructions suffer from an L3 cache miss (and are serviced by the memory system). An L1 cache miss stalls the processor for 10 cycles while the L2 is looked up. An L2 cache miss stalls the processor for 20 cycles while the L3 is looked up. An L3 cache miss stalls the processor for an additional 300 cycles while data is fetched from memory. What is the CPI for this program if 30% of the program's instructions are load instructions? (40 points)
Consider an L1 cache that has 8 sets, is direct-mapped (1-way), and supports a block size of 64 bytes. For the following memory access pattern (shown as byte addresses), show which accesses are hits and misses. For each hit, indicate the set that yields the hit. (30 points)
0, 48, 84, 32, 96, 360, 560, 48, 84, 600, 84, 48.
A 128 KB L1 cache has a 64 byte block size and is 4-way set-associative. How many sets does the cache have? How many bits are used for the offset, index, and tag, assuming that the CPU provides 32-bit addresses? How large is the tag array? Please show your steps. (30 points)