CS251 - Computer Organization and Design - Spring 2008

Lecture 32 - Cache Misses


Practical Details

  1. Assignment 7
  2. Finished pipelined execution

Cache Concepts

How the big picture is implemented

Break the address into pieces

Effect is that the low order bits apply to many locations in memory.

The cache normally contains an integral number of lines, usually a power of 2.

When a memory reference occurs

  1. Break the address into three pieces
    1. address in the line
    2. address of the line
    3. high order bits of the block number
  2. Using the line address retrieve the high order bits of the block number.
  3. Compare to the address
  4. If they match
    1. read from or write to the address
    2. on write hardware usually also writes the main memory asynchronously using a write buffer. This is called write-through.
    3. It is also possible to re-write memory only when the cache line is replaced. This is called write-back.

    otherwise

    1. Stall the processor
    2. Use the block number to get the relevant block from memory
    3. When it is installed, rerun the instruction

This way of doing things is called direct mapping.


Examples of Cache

1. One word per cache line, 16 Cache lines

Cache

Line number Valid Tag Data
0000 1 26 bits 1 word
0001 0
0010 1
0011 1
0100 1
0101 1
0110 1
0111 1
1000 0
1001 1
1010 1
1011 0
1100 0
1101 1
1110 1
1111 1

Address

Line number

in memory

Line number

in cache

Access Size
31-6 5-2 1-0

Circuit (p. 476)

  1. Ignore access size bits
  2. Use 5-2 to choose the line in the cache
  3. Test 31-6 against the tag in the line
  4. AND with valid to get HIT
  5. Return data in line plus HIT

Comments

A very basic cache.

Easy to see how to extend to a bigger cache.

2. Longer cache lines: 16 word line, 256 lines

Cache

Line number Valid Tag Word 0 Word 1 ... Word 15
00000000 1 18 bits 32 bits 32 bits 32 bits
00000001 1
00000010 0
...
11111101 0
11111110 1
11111111 0

Address

Line number

in memory

Line number

in cache

Word number

in line

Access

size

31-14 13-6 5-2 1-0

Circuit (p. 486)

  1. Ignore bits 0-1
  2. Use bits 13-6 to choose the line in the cache
  3. Compare bits 31-14 to the tag
  4. AND the result with Valid to get HIT
  5. Multiplexer selects word to return based on bits 5-2

Comments

Typical early cache technology


Cache Misses

On Read

Send memory line number to main memory (SDRAM).

SDRAM returns whole line in one transaction

Load instruction, or instruction fetch is rerun.

On Write

Send memory line number to main memory (SDRAM).

SDRAM returns whole line in one transaction

Store instruction is rerun.

Main memory is synchronized

Possible Improvements

Obvious trade-off: the longer the lines

The best performance depends on locality in the code.

Typically, cache misses slow the processor by a factor of 2.

Widen and speed up bus between main memory and cache.

Send the desired word first, then fill the rest of the line.


Return to: