### **Chapter 4: Processor Design** #### **Topics** - 4.1 The Design Process - 4.2 A 1-Bus Microarchitecture for the SRC - 4.3 Data Path Implementation - 4.4 Logic Design for the 1-Bus SRC - 4.5 The Control Unit - 4.6 The 2- and 3-Bus Processor Designs - 4.7 The Machine Reset - 4.8 Machine Exceptions ### Abstract and Concrete Register Transfer Descriptions - The abstract RTN for SRC in Chapter 2 defines "what," not "how" - A concrete RTN uses a specific set of real registers and buses to accomplish the effect of an abstract RTN statement - Several concrete RTNs could implement the same ISA ### A Note on the Design Process - This chapter presents several SRC designs - We started in Chapter 2 with an informal description - In this chapter we will propose several block diagram architectures to support the abstract RTN, then we will: - Write concrete RTN steps consistent with the architecture - Keep track of demands made by concrete RTN on the hardware - Design data path hardware and identify needed control signals - Design a control unit to generate control signals ### Fig 4.1 Block Diagram of 1-Bus SRC ### Fig 4.2 High-Level View of the 1-Bus SRC Design Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan ### Constraints Imposed by the Microarchitecture - One bus connecting most registers allows many different RTs, but only one at a time - Memory address must be copied into MA by CPU - Memory data written from or read into MD - First ALU operand always in A, result goes to C - Second ALU operand always comes from bus - Information only goes into IR and MA from bus - A decoder (not shown) interprets contents of IR - MA supplies address to memory, not to CPU bus ### Abstract and Concrete RTN for SRC add Instruction Abstract RTN: (IR $\leftarrow$ M[PC]: PC $\leftarrow$ PC + 4; instruction\_execution); instruction\_execution := ( • • • add (:= op= 12) $\rightarrow$ R[ra] $\leftarrow$ R[rb] + R[rc]: Tbl 4.1 Concrete RTN for the add Instruction | <u>Step</u> | <u>RTN</u> | |-------------|------------------------------------------------| | T0 | $MA \leftarrow PC: C \leftarrow PC + 4;$ | | <b>T1</b> | $MD \leftarrow M[MA]: PC \leftarrow C;$ | | <b>T2</b> | $IR \leftarrow MD; \qquad \qquad \uparrow IF$ | | <b>T</b> 3 | $A \leftarrow R[rb];$ IEx. | | <b>T4</b> | $C \leftarrow A + R[rc];$ | | <b>T5</b> | R[ra] ← C; | - Parts of 2 RTs (IR ← M[PC]: PC ← PC + 4;) done in T0 - Single add RT takes 3 concrete RTs (T3, T4, T5) Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan ### Concrete RTN Gives Information About Sub-units - The ALU must be able to add two 32-bit values - ALU must also be able to increment B input by 4 - Memory read must use address from MA and return data to MD - Two RTs separated by: in the concrete RTN, as in T0 and T1, are operations at the same clock - Steps T0, T1, and T2 constitute instruction fetch, and will be the same for all instructions - With this implementation, fetch and execute of the add instruction takes 6 clock cycles ### Concrete RTN for Arithmetic Instructions: addi #### **Abstract RTN:** ``` addi (:= op= 13) \rightarrow R[ra] \leftarrow R[rb] + c2\langle16..0\rangle {2's complement sign extend} : ``` #### **Concrete RTN for addi:** | <u>Step</u> | <u>RTN</u> | |-------------|--------------------------------------------------------| | T0. | $MA \leftarrow PC: C \leftarrow PC + 4;$ | | T1. | $MD \leftarrow M[MA]; PC \leftarrow C;$ | | T2. | IR ← MD; Instr Fetch | | T3. | $A \leftarrow R[rb];$ Instr Execn. | | T4. | $C \leftarrow A + c2\langle 160 \rangle $ {sign ext.}; | | T5. | <b>R[ra]</b> ← <b>C</b> ; | - Differs from add only in step T4 - Establishes requirement for sign extend hardware # Fig 4.3 More Complete View of Registers and Buses in the 1-Bus SRC Design, Including Some Control Signals - Concrete RTN lets us add detail to the data path - Instruction register logic and new paths - Condition bit flip-flop - Shift count register Keep this slide in mind as we discuss concrete RTN of instructions. #### Abstract and Concrete RTN for Load and Store ``` Id (:= op= 1) \rightarrow R[ra] \leftarrow M[disp] : st (:= op= 3) \rightarrow M[disp] \leftarrow R[ra] : where disp\langle 31..0 \rangle := ((rb=0) \rightarrow c2\langle 16..0 \rangle {sign ext.} : (rb\neq0) \rightarrow R[rb] + c2\langle 16..0 \rangle {sign extend, 2's comp.} ) : ``` Tbl 4.3 The ld and St (load/store register from memory) Instructions ``` Step RTN for Id RTN for st Instruction fetch T0-T2 T3 A \leftarrow (rb = 0 \rightarrow 0: rb \neq 0 \rightarrow R[rb]); C \leftarrow A + (16@IR(16)#IR(15..0)); T4 T5 MA \leftarrow C; T6 MD \leftarrow R[ra]; MD \leftarrow M[MA]; R[ra] \leftarrow MD; M[MA] \leftarrow MD; T7 ``` #### **Notes for Load and Store RTN** - Steps T0 through T2 are the same as for add and addi, and for <u>all instructions</u> - In addition, steps T3 through T5 are the same for Id and st, because they calculate disp - A way is needed to use 0 for R[rb] when rb = 0 - 15-bit sign extension is needed for IR(16..0) - Memory read into MD occurs at T6 of Id - Write of MD into memory occurs at T7 of st #### **Concrete RTN for Conditional Branch** ``` br (:= op= 8) \rightarrow (cond \rightarrow PC \leftarrow R[rb]): cond := ( c3\langle 2..0\rangle = 0 \rightarrow 0: never c3\langle 2..0\rangle = 1 \rightarrow 1: always c3\langle 2..0\rangle = 2 \rightarrow R[rc] = 0: if register is zero c3\langle 2..0\rangle = 3 \rightarrow R[rc] \neq 0: if register is nonzero c3\langle 2..0\rangle = 4 \rightarrow R[rc]\langle 31\rangle = 0: if positive or zero c3\langle 2...0\rangle = 5 \rightarrow R[rc]\langle 31\rangle = 1): if negative Tbl 4.4 The Branch Instruction, br RTN Step T0–T2 Instruction fetch T3 CON \leftarrow cond(R[rc]); CON \rightarrow PC \leftarrow R[rb]; T4 ``` #### **Notes on Conditional Branch RTN** - c3(2..0) are just the low-order 3 bits of IR - cond() is evaluated by a combinational logic circuit having inputs from R[rc] and c3(2..0) - The one bit register CON is not accessible to the programmer and only holds the output of the combinational logic for the condition - If the branch succeeds, the program counter is replaced by the contents of a general register # Abstract and Concrete RTN for SRC Shift Right ``` shr (:= op = 26) \rightarrow R[ra]\langle 31..0 \rangle \leftarrow (n @ 0) # R[rb]\langle 31..n \rangle : n := ( (c3\langle 4..0 \rangle = 0) \rightarrow R[rc]\langle 4..0 \rangle : Shift count in register (c3\langle 4..0 \rangle \neq 0) \rightarrow c3\langle 4..0 \rangle ): or constant field of instruction ``` Tbl 4.5 The shr Instruction ``` StepConcrete RTNT0-T2Instruction fetchT3n \leftarrow IR\langle 4..0\rangle;T4(n = 0) \rightarrow (n \leftarrow R[rc]\langle 4..0\rangle;T5C \leftarrow R[rb];T6Shr (:= (n \neq 0) \rightarrow (C\langle 31..0\rangle \leftarrow 0\#C\langle 31..1\rangle: n \leftarrow n - 1; Shr));T7R[ra] \leftarrow C; ``` #### **Notes on SRC Shift RTN** - In the abstract RTN, n is defined with := - In the concrete RTN, it is a physical register - n not only holds the shift count but is used as a counter in step T6 - Step T6 is repeated n times as shown by the recursion in the RTN - The control for such repeated steps will be treated later #### **Data Path/Control Unit Separation** - Interface between data path and control consists of gate and strobe signals - A gate selects one of several values to apply to a common point, say a bus - A strobe changes the values of the flip-flops in a register to match new inputs - The type of flip-flop used in registers has much influence on control and some on data path - Latch: simpler hardware, but more complex timing - Edge triggering: simpler timing, but about twice the hardware # Reminder on Latch- and Edge-Triggered Operation Latch output follows input while strobe is high Edge-triggering samples input at edge time To control unit (20..17) (31..17) # Fig 4.5 Extracting c1, c2, and OP from the Instruction Register, IR<31...0> I(21) is the sign bit of C1 that must be extended - I(16) is the sign bit of C2 that must be extended - Sign bits are fanned out from one to several bits and gated to bus Chapter 4—Processor Design # Fig 4.6 The CPU-Memory Interface: Memory Address and Memory Data Registers, MA<31...0> and MD<31...0> MD is loaded from memory or from CPU bus MD can drive CPU bus or memory bus # Fig 4.7 The ALU and Its Associated Registers # From Concrete RTN to Control Signals: The Control Sequence #### **Tbl 4.6 The Instruction Fetch** | <u>Step</u> | Concrete RTN | Control Sequence | |-------------|------------------------------------------------|--------------------------------------------------| | T0 | $MA \leftarrow PC \colon C \leftarrow PC + 4;$ | PCout, MAin, INC4, Cin | | <b>T1</b> | $MD \leftarrow M[MA]: PC \leftarrow C;$ | Read, C <sub>out</sub> , PC <sub>in</sub> , Wait | | <b>T2</b> | $IR \leftarrow MD;$ | MD <sub>out</sub> , IR <sub>in</sub> | | <b>T3</b> | Instruction_execution | | - The register transfers are the concrete RTN - The control signals that cause the register transfers make up the control sequence - Wait prevents the control from advancing to step T3 until the memory asserts Done ### Control Steps, Control Signals, and Timing - Within a given time step, the order in which control signals are written is irrelevant - In step T0, C<sub>in</sub>, Inc4, MA<sub>in</sub>, PC<sub>out</sub> == PC<sub>out</sub>, MA<sub>in</sub>, INC4, C<sub>in</sub> - The only timing distinction within a step is between gates and strobes - The memory read should be started as early as possible to reduce the wait - MA must have the right value before being used for the read - Depending on memory timing, Read could be in T0 ### Control Sequence for the SRC add Instruction add (:= op = 12) $$\rightarrow$$ R[ra] $\leftarrow$ R[rb] + R[rc]: #### Tbl 4.7 The add Instruction ``` <u>Step</u> Concrete RTN Control Sequence T0 MA \leftarrow PC: C \leftarrow PC + 4; PC<sub>out</sub>, MA<sub>in</sub>, INC4, C<sub>in</sub>, Read T1 MD \leftarrow M[MA]: PC \leftarrow C; C_{out}, PC_{in}, Wait T2 IR \leftarrow MD; MD<sub>out</sub>, IR<sub>in</sub> T3 A \leftarrow R[rb]; Grb, Rout, Ain T4 C \leftarrow A + R[rc]; Grc, Rout, ADD, Cin T5 C<sub>out</sub>, Gra, R<sub>in</sub>, End R[ra] \leftarrow C; ``` - Note the use of Gra, Grb, and Grc to gate the correct 5-bit register select code to the registers - End signals the control to start over at step T0 #### Control Sequence for the SRC addi Instruction ``` addi (:= op= 13) \rightarrow R[ra] \leftarrow R[rb] + c2\langle16..0\rangle {2's comp., sign ext.} : ``` #### Tbl 4.8 The addi Instruction | <u>Step</u> | Concrete RTN | <u>Control Sequence</u> | |-------------|-------------------------------------------------------|---------------------------------------------------------------------| | T0. | $MA \leftarrow PC: C \leftarrow PC + 4;$ | PC <sub>out</sub> , MA <sub>in</sub> , Inc4, C <sub>in</sub> , Read | | T1. | $MD \leftarrow M[MA]; PC \leftarrow C;$ | C <sub>out</sub> , PC <sub>in</sub> , Wait | | T2. | $IR \leftarrow MD;$ | MD <sub>out</sub> , IR <sub>in</sub> | | T3. | $A \leftarrow R[rb];$ | Grb, R <sub>out</sub> , A <sub>in</sub> | | T4. | $C \leftarrow A + c2\langle 160 \rangle$ {sign ext.}; | c2 <sub>out</sub> , ADD, C <sub>in</sub> | | T5. | <b>R[ra]</b> ← <b>C</b> ; | C <sub>out</sub> , Gra, R <sub>in</sub> , End | The c2<sub>out</sub> signal sign extends IR(16..0) and gates it to the bus ### Control Sequence for the SRC st Instruction ``` st (:= op = 3) \rightarrow M[disp] \leftarrow R[ra] : disp\langle 31..0 \rangle := ((rb=0) \rightarrow c2\langle 16..0 \rangle {sign extend} : (rb\neq0) \rightarrow R[rb] + c2\langle 16..0 \rangle {sign extend, 2's complement} ) : ``` #### The st Instruction | <u>Step</u> | Concrete RTN | <u>Control Sequence</u> | |-------------|-------------------------------------------------------------------|--------------------------------------------------| | T0-T2 | Instruction fetch | Instruction fetch | | T3 | $A \leftarrow (rb=0) \rightarrow 0: rb \neq 0 \rightarrow R[rb];$ | Grb, BA <sub>out</sub> , A <sub>in</sub> | | <b>T4</b> | $C \leftarrow A + c2\langle 160 \rangle $ {sign-extend}; | c2 <sub>out</sub> , ADD, C <sub>in</sub> | | T5 | $MA \leftarrow C;$ | C <sub>out</sub> , MA <sub>in</sub> | | <b>T6</b> | $MD \leftarrow R[ra];$ | Gra, R <sub>out</sub> , MD <sub>in</sub> , Write | | <b>T7</b> | $M[MA] \leftarrow MD;$ | Wait, End | | | | | Note BA<sub>out</sub> in T3 compared to R<sub>out</sub> in T3 of addi ### Fig 4.8 The Shift Counter - The concrete RTN for shr relies upon a 5-bit register to hold the shift count - It must load, decrement, and have an = 0 test # Tbl 4.10 Control Sequence for the SRC shr Instruction—Looping ``` Control Sequence <u>Step</u> Concrete RTN T0-T2 Instruction fetch Instruction fetch T3 n \leftarrow IR\langle 4..0\rangle; c1<sub>out</sub>, Ld T4 (n=0) \rightarrow (n \leftarrow R[rc]\langle 4..0); n=0 \rightarrow (Grc, R_{out}, Ld) T5 C \leftarrow R[rb]; Grb, R<sub>out</sub>, C=B, C<sub>in</sub> n\neq 0 \rightarrow (C_{out}, SHR, C_{in}) T6 Shr (:= (n\neq 0) \rightarrow (C\langle 31..0\rangle \leftarrow 0\#C\langle 31..1\rangle: Decr, Goto6) n ← n-1; Shr) ); T7 R[ra] \leftarrow C; C<sub>out</sub>, Gra, R<sub>in</sub>, End ``` Conditional control signals and repeating a control step are new concepts #### **Branching** cond := $$(c3\langle 2..0\rangle = 0 \rightarrow 0$$ : $c3\langle 2..0\rangle = 1 \rightarrow 1$ : $c3\langle 2..0\rangle = 2 \rightarrow R[rc] = 0$ : $c3\langle 2..0\rangle = 3 \rightarrow R[rc] \neq 0$ : $c3\langle 2..0\rangle = 4 \rightarrow R[rc]\langle 31\rangle = 0$ : $c3\langle 2..0\rangle = 5 \rightarrow R[rc]\langle 31\rangle = 1$ ): This is equivalent to the logic expression cond = $$(c3\langle 2..0\rangle = 1) \lor (c3\langle 2..0\rangle = 2) \land (R[rc] = 0) \lor (c3\langle 2..0\rangle = 3) \land \neg (R[rc] = 0) \lor (c3\langle 2..0\rangle = 4) \land \neg R[rc]\langle 31\rangle \lor (c3\langle 2..0\rangle = 5) \land R[rc]\langle 31\rangle$$ ### Fig 4.9 Computation of the Conditional Value CON NOR gate does = 0 test of R[rc] on bus ### Tbl 4.11 Control Sequence for SRC Branch Instruction, br br (:= op = 8) $$\rightarrow$$ (cond $\rightarrow$ PC $\leftarrow$ R[rb]): | <u>Step</u> | Concrete RTN | Control Sequence | |-------------|----------------------------------------|--------------------------------------------------| | T0-T2 | Instruction fetch | Instruction fetch | | <b>T3</b> | $CON \leftarrow cond(R[rc]);$ | Grc, R <sub>out</sub> , CON <sub>in</sub> | | <b>T4</b> | $CON \rightarrow PC \leftarrow R[rb];$ | Grb, $R_{out}$ , $CON \rightarrow PC_{in}$ , End | - Condition logic is always connected to CON, so R[rc] only needs to be put on bus in T3 - Only PC<sub>in</sub> is conditional in T4 since gating R[rb] to bus makes no difference if it is not used #### **Summary of the Design Process** Informal description ⇒ formal RTN description ⇒ block diagram architecture ⇒ concrete RTN steps ⇒ hardware design of blocks ⇒ control sequences ⇒ control unit and timing - At each level, more decisions must be made - These decisions refine the design - Also place requirements on hardware still to be designed - The nice one-way process above has circularity - Decisions at later stages cause changes in earlier ones - Happens less in a text than in reality because - Can be fixed on re-reading - Confusing to first-time student # Fig 4.10 Clocking the Data Path: Register Transfer Timing - t<sub>R2valid</sub> is the period from begin of gate signal till inputs to R2 are valid - t<sub>comb</sub> is delay through combinational logic, such as ALU or cond logic © 1997 V. Heuring and H. Jordan ### Signal Timing on the Data Path - Several delays occur in getting data from R1 to R2 - Gate delay through the 3-state bus driver—t<sub>g</sub> - Worst case propagation delay on bus—t<sub>bp</sub> - Delay through any logic, such as ALU—t<sub>comb</sub> - Set up time for data to affect state of R2—t<sub>su</sub> - Data can be strobed into R2 after this time $$t_{R2valid} = t_g + t_{bp} + t_{comb} + t_{su}$$ - Diagram shows strobe signal in the form for a latch. It must be high for a minimum time—t<sub>w</sub> - There is a hold time, t<sub>h</sub>, for data after strobe ends # Effect of Signal Timing on Minimum Clock Cycle A total latch propagation delay is the sum $$T_l = t_{su} + t_w + t_h$$ - All above times are specified for latch - t<sub>h</sub> may be very small or zero - The minimum clock period is determined by finding longest path from ff output to ff input - This is usually a path through the ALU - Conditional signals add a little gate delay - Using this path, the minimum clock period is $$t_{\min} = t_g + t_{bp} + t_{comb} + t_l$$ # Latches Versus Edge-Triggered or Master-Slave Flip-Flops - During the high part of a strobe a latch changes its output - If this output can affect its input, an error can occur - This can influence even the kind of concrete RTs that can be written for a data path - If the C register is implemented with latches, then C ← C + MD; is not legal - If the C register is implemented with master-slave or edge-triggered flip-flops, it is OK #### **The Control Unit** - The control unit's job is to generate the control signals in the proper sequence - Things the control signals depend on - The time step Ti - The instruction opcode (for steps other than T0, T2, T2) - Some few data path signals like CON, n = 0, etc. - Some external signals: reset, interrupt, etc. (to be covered) - The components of the control unit are: a time state generator, instruction decoder, and combinational logic to generate control signals # Fig 4.11 Control Unit Detail with Inputs and Outputs ### Synthesizing Control Signal Encoder Logic | <u>Step</u> | Control Sequence | |-------------|--------------------------------------------| | T0. | PCout, MAin, Inc4, Cin, Read | | T1. | C <sub>out</sub> , PC <sub>in</sub> , Wait | | T2. | MD <sub>out</sub> , IR <sub>in</sub> | | | add | | addi | | st | | shr | | |-------------|-------------------------------------------------------|-------------|------------------------------------------------------|-------------|----------------------------------------------------------|-------------|-----------------------------------------------------------|---------------------------| | <u>Step</u> | <b>Control Sequence</b> | <u>Step</u> | <b>Control Sequence</b> | <u>Step</u> | <b>Control Sequence</b> | <u>Step</u> | <b>Control Sequence</b> | | | T3. | Grb, R <sub>out</sub> , A <sub>in</sub> | T3. | Grb, R <sub>out</sub> , A <sub>in</sub> | T3. | Grb, BA <sub>out</sub> , A <sub>in</sub> | T3. | c1 <sub>out</sub> , Ld | | | T4. | Grc, R <sub>out</sub> , ADD, C <sub>in</sub> | T4. | c2 <sub>out</sub> , ADD, C <sub>in</sub> | T4. | c2 <sub>out</sub> , ADD, C <sub>in</sub> | T4. | $\text{n=0} \rightarrow \text{(Grc, R}_{out}, \text{Ld)}$ | $\bullet \bullet \bullet$ | | T5. | C <sub>out</sub> , <b>Gra</b> , R <sub>in</sub> , End | T5. | c <sub>out</sub> , <b>Gra,</b> R <sub>in</sub> , End | T5. | C <sub>out</sub> , MA <sub>in</sub> | T5. | Grb, R <sub>out</sub> , C=B | | | | | | | T6. | <b>Gra</b> , R <sub>out</sub> , MD <sub>in</sub> , Write | T6. | $n {\neq} 0 \to (C_{\mbox{out}}, SHR, C_{\mbox{in}},$ | | | | | | | T7. | Wait, End | T7. | Decr, Goto7) | | #### **Design process:** - Comb through the entire set of control sequences. - Find all occurrences of each control signal. - Write an equation describing that signal. Example: $Gra = T5 \cdot (add + addi) + T6 \cdot st + T7 \cdot shr + ...$ # Use of Data Path Conditions in Control Signal Logic | <u>Step</u> | <b>Control Sequence</b> | |-------------|--------------------------------------------| | T0. | PCout, MAin, Inc4, Cin, Read | | T1. | C <sub>out</sub> , PC <sub>in</sub> , Wait | | T2. | MD <sub>O.14</sub> , IR <sub>in</sub> | | | add | | addi | | st | | shr | | |-------------|-----------------------------------------------------|-------------|-----------------------------------------------|-------------|--------------------------------------------------|--------|-----------------------------------------------|---------------------------| | <u>Step</u> | <b>Control Sequence</b> | <u>Step</u> | <b>Control Sequence</b> | <u>Step</u> | <b>Control Sequence</b> | e Step | <b>Control Sequence</b> | | | T3. | Grb, R <sub>out</sub> , A <sub>in</sub> | T3. | Grb, R <sub>out</sub> , A <sub>in</sub> | T3. | Grb, BA <sub>out</sub> , A <sub>in</sub> | T3. | c1 <sub>out</sub> , Ld | | | T4. | <b>Grc,</b> R <sub>out</sub> , ADD, C <sub>in</sub> | T4. | c2 <sub>out</sub> , ADD, C <sub>in</sub> | T4. | c2 <sub>out</sub> , ADD, C <sub>in</sub> | T4. | $n=0 ightarrow (Grc, R_{out}, Ld)$ | $\bullet \bullet \bullet$ | | T5. | C <sub>out</sub> , Gra, R <sub>in</sub> , End | T5. | C <sub>out</sub> , Gra, R <sub>in</sub> , End | T5. | C <sub>out</sub> , MA <sub>in</sub> | T5. | Grb, R <sub>out</sub> , C=B | | | | | | | T6. | Gra, R <sub>out</sub> , MD <sub>in</sub> , Write | T6. | $n\neq 0 \rightarrow (C_{out}, SHR, C_{in},$ | | | | | | | T7. | Wait, End | | Decr, Goto7) | | | | | | | | | T7. | C <sub>out</sub> , Gra, R <sub>in</sub> , End | | Example: Grc = T4-add + T4-(n=0)-shr + ... # Fig 4.12 Generation of the logic for PC<sub>in</sub> and G<sub>ra</sub> # Fig 4.13 Branching in the Control Unit - 3-state gates allow 6 to be applied to counter input - Reset will synchronously reset counter to step T0 # Fig 4.14 The Clocking Logic: Start, Stop, and Memory Synchronization Mck is master clock oscillator Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan ## The Complete 1-Bus Design of SRC - High-level architecture block diagram - Concrete RTN steps - Hardware design of registers and data path logic - Revision of concrete RTN steps where needed - Control sequences - Register clocking decisions - Logic equations for control signals - Time step generator design - Clock run, stop, and synchronization logic # Other Architectural Designs Will Require a Different RTN - More data paths allow more things to be done in one step - Consider a two bus design - By separating input and output of ALU on different buses, the C register is eliminated - Steps can be saved by strobing ALU results directly into their destinations # Fig 4.15 The 2-Bus SRC Microarchitecture - Bus A carries data going into registers - Bus B carries data being gated out of registers - ALU function C = B is used for all simple register transfers ### Tbl 4.13 The 2-Bus add Instruction $\begin{array}{lll} \underline{Step} & \underline{Concrete \; RTN} & \underline{Control \; Sequence} \\ \hline T0 & MA \leftarrow PC; & PC_{out}, \, C = B, \, MA_{in}, \, Read \\ \hline T1 & PC \leftarrow PC + 4: \, MD \leftarrow M[MA]; PC_{out}, \, INC4, \, PC_{in}, \, Wait \\ \hline T2 & IR \leftarrow MD; & MD_{out}, \, C = B, \, IR_{in} \\ \hline T3 & A \leftarrow R[rb]; & Grb, \, R_{out}, \, C = B, \, A_{in} \\ \hline T4 & R[ra] \leftarrow A + R[rc]; & Grc, \, R_{out}, \, ADD, \, Sra, \\ \hline R_{in}, \, End & \\ \hline \end{array}$ - Note the appearance of Grc to gate the output of the register rc onto the B bus and Sra to select ra to receive data strobed from the A bus - Two register select decoders will be needed - Transparent latches will be required at step T2 # **Performance and Design** $$\% Speedup = \frac{T_{1-bus} - T_{2-bus}}{T_{2-bus}} \times 100$$ Where $$T = Execution\ Time = IC \times CPI \times \tau$$ ## **Speedup By Going to 2 Buses** - •Assume for now that IC and $\tau$ don't change in going from 1 bus to 2 buses - •Naively assume that CPI goes from 8 to 7 clocks. $$\%Speedup = \frac{T_{1-bus} - T_{2-bus}}{T_{2-bus}} \times 100$$ $$= \frac{IC \times 8 \times \tau - IC \times 7 \times \tau}{IC \times 7 \times \tau} \times 100 = \frac{8 - 7}{7} \times 100 = 14\%$$ #### **Class Problem:** How will this speedup change if clock period of 2-bus machine is increased by 10%? ### 3-Bus Architecture Shortens Sequences Even More - A 3-bus architecture allows both operand inputs and the output of the ALU to be connected to buses - Both the C output register and the A input register are eliminated - Careful connection of register inputs and outputs can allow multiple RTs in a step ## Fig 4.16 The 3-Bus SRC Design - A-bus is ALU operand 1, B-bus is ALU operand 2, and C-bus is ALU output - Note MA input connected to the B-bus ### Tbl 4.15 The 3-Bus add Instruction $\begin{array}{lll} \underline{Step} & \underline{Concrete\ RTN} & \underline{Control\ Sequence} \\ \hline T0 & MA \leftarrow PC:\ MD \leftarrow M[MA]; \ PC_{out},\ MA_{in},\ INC4,\ PC_{in}, \\ & PC \leftarrow PC + 4: & Read,\ Wait \\ \hline T1 & IR \leftarrow MD; & MD_{out},\ C = B,\ IR_{in} \\ \hline T2 & R[ra] \leftarrow R[rb] + R[rc]; & GArc,\ RA_{out},\ GBrb,\ RB_{out}, \\ & ADD,\ Sra,\ R_{in},\ End \\ \end{array}$ - Note the use of 3 register selection signals in step T2: GArc, GBrb, and Sra - In step T0, PC moves to MA over bus B and goes through the ALU INC4 operation to reach PC again by way of bus C - PC must be edge-triggered or master-slave - Once more MA must be a transparent latch # **Performance and Design** - How does going to three buses affect performance? - Assume average CPI goes from 8 to 4, while $\tau$ increases by 10%: $$\%Speedup = \frac{IC \times 8 \times \tau - IC \times 4 \times 1.1\tau}{IC \times 4 \times 1.1\tau} \times 100 = \frac{8 - 4.4}{4.4} \times 100 = 82\%$$ #### **Processor Reset Function** - Reset sets program counter to a fixed value - May be a hardwired value, or - contents of a memory cell whose address is hardwired - The control step counter is reset - Pending exceptions are prevented, so initialization code is not interrupted - It may set condition codes (if any) to known state - It may clear some processor state registers - A "soft" reset makes minimal changes: PC, T (trace) - A "hard" reset initializes more processor state ## **SRC Reset Capability** - We specify both a hard and soft reset for SRC - The Strt signal will do a hard reset - It is effective only when machine is stopped - It resets the PC to zero - It resets all 32 general registers to zero - The Soft Reset signal is effective when the machine is running - It sets PC to zero - It restarts instruction fetch - It clears the Reset signal - Actions are described in instruction\_interpretation ### **Abstract RTN for SRC Reset and Start** #### **Processor State** Strt: Start signal Rst: External reset signal ``` instruction_interpretation := ( ¬Run∧Strt → (Run ← 1: PC, R[0..31] ← 0); Run∧¬Rst → (IR ← M[PC]: PC ← PC + 4; instruction_execution): Run∧Rst → (Rst ← 0: PC ← 0); instruction_interpretation): ``` ### Resetting in the Middle of Instruction Execution - The abstract RTN implies that reset takes effect after the current instruction is done - To describe reset during an instruction, we must go from abstract to concrete RTN - Questions for discussion: - Why might we want to reset in the middle of an instruction? - How would we reset in the middle of an instruction? # Tbl 4.17 The add Instruction with Reset Processing ``` Concrete RTN <u>Step</u> T0 \neg \mathsf{Rst} \rightarrow (\mathsf{MA} \leftarrow \mathsf{PC} : \mathsf{C} \leftarrow \mathsf{PC} + 4): Rst \rightarrow (Rst \leftarrow 0: PC \leftarrow 0: T \leftarrow 0): T1 \neg \mathsf{Rst} \rightarrow (\mathsf{MD} \leftarrow \mathsf{M[MA]} : \mathsf{P} \leftarrow \mathsf{C}): Rst \rightarrow (Rst \leftarrow 0: PC \leftarrow 0: T \leftarrow 0): T2 \neg \mathsf{Rst} \rightarrow (\mathsf{IR} \leftarrow \mathsf{MD}): Rst \rightarrow (Rst \leftarrow 0: PC \leftarrow 0: T \leftarrow 0): T3 \neg \mathsf{Rst} \rightarrow (\mathsf{A} \leftarrow \mathsf{R[rb]}): Rst \rightarrow (Rst \leftarrow 0: PC \leftarrow 0: T \leftarrow 0): \neg \mathsf{Rst} \rightarrow (\mathsf{C} \leftarrow \mathsf{A} + \mathsf{R[rc]}): T4 Rst \rightarrow (Rst \leftarrow 0: PC \leftarrow 0: T \leftarrow 0): \neg \mathsf{Rst} \to (\mathsf{R[ra]} \leftarrow \mathsf{C)}: T5 Rst \rightarrow (Rst \leftarrow 0: PC \leftarrow 0: T \leftarrow 0): ``` See text for the corresponding control signals # Control Sequences Including the Reset Function - CIrPC clears the program counter to all zeros, and CIrR clears the 1-bit Reset flip-flop - Because the same reset actions are in every step of every instruction, their control signals are independent of time step or opcode ### **General Comments on Exceptions** - An exception is an event that causes a change in the program specified flow of control - Because normal program execution is interrupted, they are often called interrupts - We will use exception for the general term and use interrupt for an exception caused by an external event, such as an I/O device condition - The usage is not standard. Other books use these words with other distinctions, or none # Combined Hardware/Software Response to an Exception - The system must control the type of exceptions it will process at any given time - The state of the running program is saved when an allowed exception occurs - Control is transferred to the correct software routine, or "handler," for this exception - This exception, and others of less or equal importance, are disallowed during the handler - The state of the interrupted program is restored at the end of execution of the handler ## Hardware Required to Support Exceptions - To determine relative importance, a priority number is associated with every exception - Hardware must save and change the PC, since without it no program excution is possible - Hardware must disable the current exception lest is interrupt the handler before it can start - Address of the handler is called the exception vector and is a hardware function of the exception type - Exceptions must access a save area for PC and other hardware saved items - Choices are special registers or a hardware stack # New Instructions Needed to Support Exceptions - An instruction executed at the end of the handler must reverse the state changes done by hardware when the exception occurred - There must be instructions to control what exceptions are allowed - The simplest of these enable or disable all exceptions - If processor state is stored in special registers on an exception, instructions are needed to save and restore these registers ## **Kinds of Exceptions** - System reset - Exceptions associated with memory access - Machine check exceptions - Data access exceptions - Instruction access exceptions - Alignment exceptions - Program exceptions - Miscellaneous hardware exceptions - Trace and debugging exceptions - Nonmaskable exceptions - External exceptions—interrupts ## An Interrupt Facility for SRC - The exception mechanism for SRC handles external interrupts - There are no priorities, but only a simple enable and disable mechanism - The PC and information about the source of the interrupt are stored in special registers - Any other state saving is done by software - The interrupt source supplies 8 bits that are used to generate the interrupt vector - It also supplies a 16-bit code carrying information about the cause of the interrupt # SRC Processor State Associated with Interrupts #### **Processor interrupt mechanism** From Device → ireq: Interrupt request signal **To Device** → iack: Interrupt acknowledge signal Internal → IE: 1-bit interrupt enable flag to CPU $\rightarrow$ IPC $\langle 31..0 \rangle$ : Storage for PC saved upon interrupt to CPU $\rightarrow$ II(31..0): Information on source of last interrupt From Device → Isrc\_info(15..0): Information from interrupt source From Device → Isrc\_vect $\langle 7...0 \rangle$ : Type code from interrupt source Internal $\rightarrow$ Ivect(31..0):= 20@0#Isrc\_vect(7..0)#4@0: #### Ivect(31..0) | | 000 0 | Isrc_vec | t(70) | 0000 | |----|-------|----------|-------|------| | 31 | | 1211 | 4: | 3 0 | # SRC Instruction Interpretation Modified for Interrupts - If interrupts are enabled, PC and interrupt information are stored in IPC and II, respectively - With multiple requests, external priority circuit (discussed in later chapter) determines which vector and information are returned - Interrupts are disabled - The acknowledge signal is pulsed ### **SRC Instructions to Support Interrupts** #### Return from interrupt instruction rfi (:= op = 29 ) $$\rightarrow$$ (PC $\leftarrow$ IPC: IE $\leftarrow$ 1): #### Save and restore interrupt state ``` svi (:= op = 16) \rightarrow (R[ra]\langle 15..0 \rangle \leftarrow II\langle 15..0 \rangle: R[rb] \leftarrow IPC\langle 31..0 \rangle): ri (:= op = 17) \rightarrow (II\langle 15..0 \rangle \leftarrow R[ra]\langle 15..0 \rangle: IPC\langle 31..0 \rangle \leftarrow R[rb]): ``` #### Enable and disable interrupt system een (:= op = 10 ) $$\rightarrow$$ (IE $\leftarrow$ 1): edi (:= op = 11 ) $\rightarrow$ (IE $\leftarrow$ 0): The 2 rfi actions are indivisible, can't een and branch # Concrete RTN for SRC Instruction Fetch with Interrupts - PC could be transferred to IPC over the bus - II and IPC probably have separate inputs for the externally supplied values - iack is pulsed, described as ←1; ←0, which is easier as a control signal than in RTN # **Exceptions During Instruction Execution** - Some exceptions occur in the middle of instructions - Some CISCs have very long instructions, like string move - Some exception conditions prevent instruction completion, like uninstalled memory - To handle this sort of exception, the CPU must make special provision for restarting - Partially completed actions must be reversed so the instruction can be re-executed after exception handling - Information about the internal CPU state must be saved so that the instruction can resume where it left off - We will see that this problem is acute with pipeline designs—always in middle of instructions # Recap of the Design Process: the Main Topic of Chapter 4 Informal description Formal RTN description **Block diagram architecture Concrete RTN steps** Hardware design of blocks **Control sequences Control unit and timing** SRC Chapter 2 Chapter 4 ## **Chapter 4 Summary** - Chapter 4 has done a nonpipelined data path and a hardwired controller design for SRC - The concepts of data path block diagrams, concrete RTN, control sequences, control logic equations, step counter control, and clocking have been introduced - The effect of different data path architectures on the concrete RTN was briefly explored - We have begun to make simple, quantitative estimates of the impact of hardware design on performance - Hard and soft resets were designed - A simple exception mechanism was supplied for SRC