CS251 - Computer Organization and Design - Spring 2008
Lecture 33 - Other Cache Types
Practical Details
- Assignment 8
- History of computing
- Reminder: what we are doing is called direct
mapping.
Examples of Cache Memories
1. One word per cache line, 16 Cache lines
Cache
| Line number |
Valid |
Tag |
Data |
| 0000 |
1 |
26 bits |
1 word |
| 0001 |
0 |
|
|
| 0010 |
1 |
|
|
| 0011 |
1 |
|
|
| 0100 |
1 |
|
|
| 0101 |
1 |
|
|
| 0110 |
1 |
|
|
| 0111 |
1 |
|
|
| 1000 |
0 |
|
|
| 1001 |
1 |
|
|
| 1010 |
1 |
|
|
| 1011 |
0 |
|
|
| 1100 |
0 |
|
|
| 1101 |
1 |
|
|
| 1110 |
1 |
|
|
| 1111 |
1 |
|
|
Address
| Line number
in memory
|
Line number
in cache
|
Access Size |
| 31-6 |
5-2 |
1-0 |
Circuit (p. 476)
- Ignore access size bits
- Use 5-2 to choose the line in the cache
- Test 31-6 against the tag in the line
- AND with valid to get HIT
- Return data in line plus HIT
Comments
A very basic cache.
Easy to see how to extend to a bigger cache.
2. Longer cache lines: 16 word line, 256 lines
Cache
| Line number |
Valid |
Tag |
Word 0 |
Word 1 |
... |
Word 15 |
| 00000000 |
1 |
18 bits |
32 bits |
32 bits |
|
32 bits |
| 00000001 |
1 |
|
|
|
|
|
| 00000010 |
0 |
|
|
|
|
|
| ... |
|
|
|
|
|
|
| 11111101 |
0 |
|
|
|
|
|
| 11111110 |
1 |
|
|
|
|
|
| 11111111 |
0 |
|
|
|
|
|
Address
| Line number
in memory
|
Line number
in cache
|
Word number
in line
|
Access
size
|
| 31-14 |
13-6 |
5-2 |
1-0 |
Circuit (p. 486)
- Ignore bits 0-1
- Use bits 13-6 to choose the line in the cache
- Compare bits 31-14 to the tag
- AND the result with Valid to get HIT
- Multiplexer selects word to return based on bits 5-2
Comments
Typical early cache technology
Cache Misses
On Read
Send memory line number to main memory (SDRAM).
SDRAM returns whole line in one transaction
- Address to DRAM, Data 0 to Cache, Data 1 to cache, etc.
- Wait until word 0 typically 40 nsec
- Then time per word much faster, 2-4 nsec.
- Reason is slow row addressing, fast column addressing
Load instruction, or instruction fetch is rerun.
- Pipeline is frozen while this takes place
On Write
Send memory line number to main memory (SDRAM).
SDRAM returns whole line in one transaction
- Address to DRAM, Data 0 to Cache, Data 1 to cache, etc.
- Wait until word 0 typically 40 nsec
- Then time per word much faster, 2-4 nsec.
Store instruction is rerun.
- Pipeline is frozen while this takes place.
Main memory is synchronized
- Write-through: two writes happen at once
- but what if another write occurs immediately after?
- possibly stall
- assisted by a hardware write queue
- Write-back: don't synchronize immediately, but when the cache line
leaves the cache
- some misses are even more expensive
- multi-processing is difficult
Possible Improvements
Obvious trade-off: the longer the lines
The best performance depends on locality in the code.
Typically, cache misses slow the processor by a factor of 2.
Widen and speed up bus between main memory and cache.
Send the desired word first, then fill the rest of the line.
More than one line per set of address bits.
Other Cache Organizations
Fully Associative
Cache lines are not addressable
Memory
Address
| Line number in memory |
Word number in line |
Access Size |
| 31 to n2+1 |
n2 to n1+1 |
n1 to 0 |
| goes into the tag |
addresses within the line |
ignored |
- n1: normally 2
- n2-n1: log_2 (number of words in a line)
Circuit
- All tags compared in parallel.
- Results ANDed with the corresponding Valid bit to give LineHit
- All words activated in parallel
- All LineHits ORed to give Hit
- If ( Hit )
- Multiplexer addressed by location of LineHit chooses correct
word
- else
- Processor is stalled
- Line is chosen to remove from cache
- Line is retrieved from next level of memory
- Instruction is rerun
Set Associative
Effectively a set of direct mapping caches
- associative mapping between sets
Cache
| Cache address |
Valid |
Tag |
Data |
Valid |
Tag |
Data |
Valid |
Tag |
Data |
Valid |
Tag |
Data |
| 000...000 |
|
31 - n3 bits |
|
|
31 - n3 bits |
|
|
31 - n3 bits |
|
|
31 - n3 bits |
|
| 000...001 |
|
|
|
|
|
|
|
|
|
|
|
|
| 000...010 |
|
|
|
|
|
|
|
|
|
|
|
|
| ... |
|
|
|
|
|
|
|
|
|
|
|
|
| 111...110 |
|
|
|
|
|
|
|
|
|
|
|
|
| 111...111 |
|
|
|
|
|
|
|
|
|
|
|
|
Address
| Line number in memory |
Line number in set |
Word number in line |
Access Size |
| 31 to n3+1 |
n3 to n2+1 |
n2 to n1+1 |
n1 to 0 |
| Tag |
Cache address |
|
Ignored |
- n1: normally 2
- n2-n1: log_2 (number of words in a line)
- n3-n2: log_2 (number of lines in a set)
Circuit (Figure 7.17, page 503)
- Use line number in set to choose a horizontal set of lines
- In parallel match Line number in memory to each tag in the set
- AND with Valid to get SetHit
- In parallel use Word number in line to activate one word in each
set
- All SetHits ORed to get Hit
- If (Hit) then
- Multiplexer addressed by true SetHit selects word to return
- Else
- Processor is stalled
- Line is chosen to remove fram cache
- Line is retrieved from next level of memory
- Instruction is rerun
New Issues
Replacement Algorithm
Usually least recently used (LRU). i.e. most dusty
Return to: