CS251 - Computer Organization and Design - Spring 2008
Lecture 32 - Cache Misses
Practical Details
- Assignment 7
- Finished pipelined execution
Cache Concepts
How the big picture is implemented
Break the address into pieces
- High order bits determine which block
- low order bits determine where in the block
- Where the break is done determines the number of blocks and the size of
the blocks
- Here the block is the size of a cache line
Effect is that the low order bits apply to many locations in memory.
The cache normally contains an integral number of lines, usually a power
of 2.
- Break the block number into two pieces
- low order bits adequate to address each line in the cache
- high order bits indicate which block in memory is in that line.
- Store the high order bits with the line
When a memory reference occurs
- Break the address into three pieces
- address in the line
- address of the line
- high order bits of the block number
- Using the line address retrieve the high order bits of the block
number.
- Compare to the address
- If they match
- read from or write to the address
- on write hardware usually also writes the main memory
asynchronously using a write buffer. This is called
write-through.
- It is also possible to re-write memory only when the cache line is
replaced. This is called write-back.
otherwise
- Stall the processor
- Use the block number to get the relevant block from memory
- When it is installed, rerun the instruction
This way of doing things is called direct mapping.
Examples of Cache
1. One word per cache line, 16 Cache lines
Cache
Line number |
Valid |
Tag |
Data |
0000 |
1 |
26 bits |
1 word |
0001 |
0 |
|
|
0010 |
1 |
|
|
0011 |
1 |
|
|
0100 |
1 |
|
|
0101 |
1 |
|
|
0110 |
1 |
|
|
0111 |
1 |
|
|
1000 |
0 |
|
|
1001 |
1 |
|
|
1010 |
1 |
|
|
1011 |
0 |
|
|
1100 |
0 |
|
|
1101 |
1 |
|
|
1110 |
1 |
|
|
1111 |
1 |
|
|
Address
Line number
in memory
|
Line number
in cache
|
Access Size |
31-6 |
5-2 |
1-0 |
Circuit (p. 476)
- Ignore access size bits
- Use 5-2 to choose the line in the cache
- Test 31-6 against the tag in the line
- AND with valid to get HIT
- Return data in line plus HIT
Comments
A very basic cache.
Easy to see how to extend to a bigger cache.
2. Longer cache lines: 16 word line, 256 lines
Cache
Line number |
Valid |
Tag |
Word 0 |
Word 1 |
... |
Word 15 |
00000000 |
1 |
18 bits |
32 bits |
32 bits |
|
32 bits |
00000001 |
1 |
|
|
|
|
|
00000010 |
0 |
|
|
|
|
|
... |
|
|
|
|
|
|
11111101 |
0 |
|
|
|
|
|
11111110 |
1 |
|
|
|
|
|
11111111 |
0 |
|
|
|
|
|
Address
Line number
in memory
|
Line number
in cache
|
Word number
in line
|
Access
size
|
31-14 |
13-6 |
5-2 |
1-0 |
Circuit (p. 486)
- Ignore bits 0-1
- Use bits 13-6 to choose the line in the cache
- Compare bits 31-14 to the tag
- AND the result with Valid to get HIT
- Multiplexer selects word to return based on bits 5-2
Comments
Typical early cache technology
Cache Misses
On Read
Send memory line number to main memory (SDRAM).
SDRAM returns whole line in one transaction
- Address to DRAM, Data 0 to Cache, Data 1 to cache, etc.
- Wait until word 0 typically 40 nsec
- Then time per word much faster, 2-4 nsec.
- Reason is slow row addressing, fast column addressing
Load instruction, or instruction fetch is rerun.
- Pipeline is frozen while this takes place
On Write
Send memory line number to main memory (SDRAM).
SDRAM returns whole line in one transaction
- Address to DRAM, Data 0 to Cache, Data 1 to cache, etc.
- Wait until word 0 typically 40 nsec
- Then time per word much faster, 2-4 nsec.
Store instruction is rerun.
- Pipeline is frozen while this takes place.
Main memory is synchronized
- Write-through: two writes happen at once
- but what if another write occurs immediately after?
- possibly stall
- assisted by a hardware write queue
- Write-back: don't synchronize immediately, but when the cache line
leaves the cache
- some misses are even more expensive
- multi-processing is difficult
Possible Improvements
Obvious trade-off: the longer the lines
The best performance depends on locality in the code.
Typically, cache misses slow the processor by a factor of 2.
Widen and speed up bus between main memory and cache.
Send the desired word first, then fill the rest of the line.
Return to: