CS251 - Computer Organization and Design - Spring 2008
Lecture 22 - Intreoduction to Multiple Cycle Designs
Practical Details
- Exam results
- mean 40.4 (67%)
- std dev 10.3 (17%)
- 25% point 33.5 (56%)
- median 44.3 (74%)
- 75% point 48 (80%)
- Assignment this week
- Add a shift instruction:
srl rd, rt, shamt -- | 000000
| rs | rt | rd | shamt (5 bits) | 000010 |
Single Cycle Weaknesses
Control Logic
Seems pretty ad hoc.
Extensibility
See above
Timing
Different cycles take different amounts of time
- Clock can be no faster than the slowest
Pipelining
Phases of instruction execution
- Instruction fetch & decode
- Operand values
- Result calculation
- Result writeback
We would like to get these separated from one another,
- which means storing state between phases
And yet,
The most popular processor architecture in the world is a single cycle
one
Which brings up a MIPS oddity
No condition codes
- What are condition codes
- N - negative: bit 31 of most recent result
- Z - zero: NAND of all bits of result
- C - carry:
- addition: carry out of MSB (unsigned overflow)
- subtraction: borrow in MSB (unsigned underflow)
- shift: last bit shifted out
- otherwise unchanged
- V - overflow:
- addition/subtraction: overflow in TCI
- otherwise unchanged
MIPS uses the set instruction followed by conditional branch
Multicycle Execution
Separating the phases
Instruction Fetch
Store instruction in register
- You can now put a different address on the address lines
- The only restriction is that you can't latch it into the instruction
register until you ar finished with the ocntents of the instruction
register
- You can also change the program counter
- probably using the ALU, which is not yet being used by any of your
arguments
- Latch the new program counter into the PC-register at the same time
you latch the instructiuon into the instruction register.
Increment PC using ALU
Evaluating Operands
Put operands into registers, which are input to
Control Logic
Highest six bits of instruction is opcode
- For opcode 0, lowest six bits select function
Control logic needs to accept opcode (26:31) and function (0:5)
- and output all the control signals
Split into two stages
- From opcode only generate all control signals except ALUctl, plus
ALUop
- From ALUop plus function generate ALUctl
Signals to generate
| Signal |
0 |
1 |
| RegDst |
rt |
rd |
| RegWrite |
n/a |
write register |
| ALUSrc |
register |
instruction |
| Branch |
no branch |
branch |
| MemRead |
n/a |
read memory |
| MemWrite |
n/a |
write memory |
| MemToReg |
write register from ALU |
write register from memory |
| ALUOp0 |
not branch |
branch |
| ALUOp1 |
not R-type |
R-type |
Opcodes
| Opcode |
Instruction |
Assert |
| 100011 |
lw |
ALUSrc, MemToReg, RegWrite, MemRead |
| 101011 |
sw |
ALUSrc, MemWrite |
| 000100 |
beq |
Branch, ALUOp0 |
| 000000 |
R-format |
RegDst, RegWrite, ALUOp1 |
ALUCtl
| Operation |
ALUOp |
Funct |
action |
ALUCtl |
| beq |
01 |
XXXXXX |
subtract |
110 |
| add |
10 |
100000 |
add |
010 |
| sub |
10 |
100010 |
subtract |
110 |
| and |
10 |
100100 |
AND |
000 |
| or |
10 |
100101 |
OR |
001 |
| slt |
10 |
101010 |
set on less than |
111 |
Timing
Suppose
- memory units 200 picoseconds
- ALUs 100 ps
- register files 50 ps
Then for R-type instructions
- Longest path (?)
- inst fetch : 200 ps (really?)
- argument values : 50 ps
- ALU : 100 ps
- result write : 50 ps
- total : 400 ps (200 ps ?)
- Other paths
But for loads
- Longest path (?)
- argument values : 50 ps
- ALU : 100 ps
- result read : 200 ps
- result write : 50 ps
- inst fetch : 200 ps (really ?)
- total : 600 ps (400 ps ?)
And for conditional branches
- Longest path
- argument values : 50 ps
- ALU: 100 ps
- inst fetch : 200 ps
- total : 350 ps
Possible competitor
- inst fetch : 200ps
- everything else done before start of clock
Climax
The CPU plus memory is just a finite state machine,
albeit a complex one.
Return to: