CS251 - Computer Organization and Design - Spring 2008
Lecture 34 - Virtual Memory
Practical Details
- Assignment 8
- Assignment 9
Virtual Memory
Like cache, but between main memory and disk
Disk Hardware
Spinning disk & read/write head
Disk speed
- Seek time
- about 5 milliseconds
- Rotational delay: 9,000 revolutions per minute (RPM)
- 150 RPS
- 7 milliseconds per revolution
- worst case is 7 milliseconds = 7,000,000 nanoseconds seek time
- average case is 3.5 milliseconds
- 125 Megabytes per second transfer rate
- Direct memory access (DMA)
- 30 microseconds = 30,000 nanoseconds to transfer a page
Seek time can be minimized by smart algorithms, rotational delay cannot.
But predictive caching helps a lot in some disks.
Sizes
- Disk: 100 Gbytes
- DRAM: 1-10 GBytes
Surely, virtual memory is an obsolete concept.
- It should be, but
- application and operating system sizes just keep getting bigger
- and so does the number of programs that users have open
simultaneously
Example
- 4 Gbyte real memory
- 32 bit addresses, which allow a 4 Gbyte virtual memory
Virtual memory is indeed obsolete, but
- 64 bit addresses allow a 16 exabyte virtual memory, so
- you have to learn it anyway.
Procedure
Terminology
- page = block
- page fault = cache miss
- copyback = write-back
Address translation (64 bit address, 4Kbyte pages, 1 terabyte physical
memory)
Virtual address
| 63 to 12 |
11 to 0 |
| Virtual page number |
Offset within page |
maps to
Physical address
| 39 to 12 |
11 to 0 |
| Physical page number |
Offset within page |
Address translation uses a page table
- table very large, various algorithmic techniques to lighten the
load
Physical pages can be
Page table
| Virtual page number |
Valid |
Physical page number |
| 63 to 12 |
0/1 |
39 to 12 |
|
|
|
|
|
|
Algorithm
- Look up virtual page number in page table
- if (Valid) //page is in memory
- form phyaical address by concatenating page offset to physical page
number
- return the requested data
- (What does this mean for alignment of cache lines with respect to
pages?)
- else
- raise a `page fault' interrupt and let the operating system handle
getting the page
This is pretty gross, which means `not pretty at all'.
Integration of virtual memory with the cache
Table Lookaside Buffer (TLB)
- subset of page table containing recently used pages
| Valid |
Dirty |
Ref |
Read |
Write |
Tag (Virtual Page Number) |
Physical Page Number |
| 0/1 |
0/1 |
|
0/1 |
0/1 |
63 to 12 |
40 to 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Algorithm
- Divide virtual address
| 63 -- Virtual page number -- 12 |
11 -- page offset -- 0 |
- If virtual page number matches tag AND Valid AND Read OR Write then
- Increment Ref
- If (access is Write) then
- Set Dirty
- Form physical address
| 39 -- Physical page number -- 12 |
11 -- page offset -- 0 |
- Divide phyical address
| 39 -- Line number in memory -- 14 |
13 -- line number in cache --6 |
5 -- offset in line -- 2 |
1 -- acess size -- 0 |
and access cache.
- Else if NOT (Read OR Write) then
- Raise memory protection exception
- Else if NOT(virtual page number matches tag AND Valid) then
- Raise page fault exception
- OS chooses page to replace.
- If (Dirty on that page) OS writes that page back to disk
- OS reads in new page from disk
Comment. OS usually swaps out any process that incurs a page fault so as
to use the processor for something else.
Input/Output
The key concept is the `system bus', which is also known as the
though these two night be separate entities. Combined memory-I/O buses are
used in systems with
On the system bus you will find three types of devices
- Parts of the User interface, interact with humans
Note. Data rates that follow are nominal, which does not always match
up with useful. E.g. keyboard
- 100 words per minute
- 10 keystrokes per second
- 100 possible keys means 7 bits per key
- Therefore 70 ~ 100 bits per second
But to get that rate most of us would need to type the same phrase
over and over, which is not very useful
| Device |
Input/Output |
I/O data rate (Mbit/sec) |
| Keyboard |
Input |
0.0001 |
| Mouse |
Input/output |
0.0038 |
| Voice |
Input/output |
0.264 |
| Printer |
Output |
3.2 |
| Bit-mapped graphics |
Output |
100 |
- Memory
| Device |
Data rate (Mbit/sec) |
Access Delay (microsec) |
| Magnetic tape |
30 |
1,000,000,000 (human limited) |
| Optical disk (CDROM) |
80 |
100,000,000 (human limited) |
| Magnetic disk |
1,000 |
10,000 |
| SDRAM |
20,000 |
0.05 |
Built into a `seamless' memory hierarchy, but if you don't know where
the seams are your programs won't run very well.
- Network interfaces
| Device |
Data rate (Mbit/sec) |
Access Delay (microsec) |
| Modem |
0.06 |
15,000,000 |
| Wireless LAN |
50 |
1,000,000 |
| Wired LAN |
1,000 |
< 1,000 |
Buses
Processor to Cache
Bandwidth: 1-5 Gwords/sec
Cache to High Bandwidth Devices (North Bridge)
Devices
- Main memory
- Bit-mapped graphics
- Network
Bandwidth: 200-500 Mwords/sec
High Bandwidth Devices to Low Bandwidth Devices (South Bridge)
Devices
- Disks
- Audio
- USB for keyboard, mouse, etc
- Slow ethernet
- CDROM
- Legacy devices
Bus Transactions
Concepts
- Master/slave
- Bus arbitration
- Synchronous/asynchronous
- Block transfer
- Multiplexed/non-Multiplexed
Typical Bus Transaction (Asynchronous, multiplexed)
- Master requests bus
- Master receives bus from bus arbitration hardware
- Master asserts Address, then Read.
- Slave sees Read, latches Address, then asserts Acknowledge
- Master sees Acknowledge, releases Read and Address
- Slave sees Read released, releases Acknowledge
- Slave asserts Data, then DataReady
- Master sees DataReady, latches Data, then asserts Acknowledge
- Slave sees Acknowledge, releases Data and DataReady
- Master sees DatReady released, releases Acknowledge
- Master releases bus
On a block transfer steps 7 to 10 are repeated until all data is
transferred, the releases bus.
On a synchronous bus, assertion of Read, Acknowledge and DataReady are
timed by a bus clock
On a non-Multiplexed bus there are separate address and data lines, and
separate acknowledge lines
DMA (direct memory access) is possible if a device can become bus
master.
Interrupts
Devices can assert interrupt lines on the bus to request service from the
processor.
Interrupt Processing
- Device asserts its interrupt output
- Interrupt control unit (ICU) receives interrupt signal on its input,
asserts its interrupt output
- Before each instruction fetch the processor checks its interrupt
input.
- If it sees the input asserted
- It reads a register of the ICU, during which the pipeline
drains
- The read returns a program counter (called an interrupt vector)
- The program counter is used to fetch the first instruction of the
interrupt service routine (ISR). We saw this briefly earlier when we
were discussing control signals in the processor.
Return to: