CS251 - Computer Organization and Design - Spring 2008
Lecture 31 - Cache Memory
Practical Details
- Assignment 7
- Finished pipelined execution
Memory Hardware
Hardware Performance
|
Access time
(nsec)
|
Cost
per Gbyte
|
Typical Size |
Notes |
Registers |
0.1 |
|
128 bytes |
Can't separate cost from remainder of processor |
Cache (SRAM) |
0.1 - 1.0 |
$4000.00 |
~1 MByte |
Big on-chip, off-chip differences |
Main memory (DRAM) |
50 - 70 |
$100.00 |
~2 Gbyte |
SDRAM is faster in bursts |
Ramdisk (based on NVRam) |
read: 10.0
write: 10,000.0
|
$5.00 |
.
|
Inadequate capacity for demand.
Many competing technologies.
Limited number of memory cycles
|
Disk |
seek: 5,000,000
continuous: ~100
|
$0.50 |
~100Gbyte |
|
Internet |
infinity |
free |
~10Tbyte |
|
Memory Concepts
Locality
Block
- block when used for virtual memory
- segment when used for memory protection
- line when used for cache
Cache
Virtual
- memory mapping
- page table
- memory protection
- memory management unit
- address translation
The Big Picture
- We have a little fast expensive memory, and
- we have a lot of inexpensive slow memory
- applies at any boundary in the memory hierarchy
- We replicate parts of the inexpensive memory in the expensive memory
- when it's in the fast memory use it
- when it's not in the fast memory fetch it from the slow memory into
the fast memory
- when it arrives try the instruction again.
Moral of the story
Better not to miss very often, because there are huge performance hits.
How the big picture is implemented
Break the address into pieces
- High order bits determine which block
- low order bits determine where in the block
- Where the break is done determines the number of blocks and the size of
the blocks
- Here the block is the size of a cache line
Effect is that the low order bits apply to many locations in memory.
The cache normally contains an integral number of lines, usually a power
of 2.
- Break the block number into two pieces
- low order bits adequate to address each line in the cache
- high order bits indicate which block in memory is in that line.
- Store the high order bits with the line
When a memory reference occurs
- Break the address into three pieces
- address in the line
- address of the line
- high order bits of the block number
- Using the line address retrieve the high order bits of the block
number.
- Compare to the address
- If they match
- read from or write to the address
- on write hardware usually also writes the main memory
asynchronously using a write buffer. This is called
write-through.
- It is also possible to re-write memory only when the cache line is
replaced. This is called write-back.
otherwise
- Stall the processor
- Use the block number to get the relevant block from memory
- When it is installed, rerun the instruction
This way of doing things is called direct mapping.
Return to: