# Introduction

  • Computer System

    • Hardware
    • Software
  • Software

    • Application software (domain knowledge)
    • System software (machine architecture)

  • System software consists of a veriety of programs that support the operation of a computer.

    • Text editor, assembler, macro processor, compiler, loader, linker, debugger, operating system.

# System Software and Machine Architecture

  • System software 與 application software 的決定性差異:
    • Machine-dependent (MD) features.(在不同硬體環境需要改寫 source code)
  • System software 有部分 source code 是 machine-independent (MID),application software 則全部都是 machine-independent。
  • SIC (simplified instructional computer) with two versions:
    • The standard model
    • The XE (extra equipment) version
  • The two versions have designed to be upward compatible (向上兼容).

# SIC Machine Architecture

# Memory

  • Byte: 8-bit
  • Word: consecutive 3 bytes (24 bits).
  • All addresses are byte addresses
  • Words are addressed by the location of their lowest numbered byte.
  • There are a total of 32768 (2152^{15}) bytes in the computer memory.
    • Address 的範圍: 00 ~ 21512^{15} - 1

# Registers

Five registers, all of which have special uses. Each register is 24 bits in length.

  • A: accumulator. 累加器
  • X: index register
  • L: linkage register; storing the return address after a funciton call.
  • PC: program counter. 儲存下一條要執行的指令 (在機器語言中) 的記憶體位址
  • SW: status word; including a Condition Code (CC)

# Data formats

  • Integers are stored as 24-bit binary 2’s complement representation is used for negative values.

    • 二補數相較於一補數沒有 +0 與 -0 的問題,可以多表達一個數。
  • Characters are stored using their 8-bit ASCII codes.

  • There is no floating-point hardware on the standard version of SIC.

# Instruction formats

  • 24-bit format
    • opcode (指令碼): 8bits, 代表要執行的是哪種指令

    • x: 1bit, 用來表示是否為索引定址模式 (indicated indexed-addressing mode)

    • address: 15bits, 因為記憶體大小為2152^{15} bytes, 因此 15 個 bits 就能表示所有記憶體位置,存在此位置的資料就是此指令要使用的資料

# Addressing modes

  • There are two addressing mode avilable.
  • Direct: x = 0; TA (target address) = address
  • Indexed: x = 1; TA = address + (X)
    • X is the index register (X register)
  • Parentheses (小括號) are used to indicate the contents of a register or a memory location.

# Instruction set

  • Load and store registers

    • LDA , LDX , … : Load A, Load X, etc.
    • STA , STX , … : Store A, Store X, etc.
  • Integer arithmetic operations

    • ADD , SUB , MUL , DIV
  • COMP : Compares the value in register A with a word in memory

    • Setting a condition code (CC) to indicate the result (<, =, or >)
    • CC is stored in register SW
  • Conditional jump instructions

    • JLT : Jump if CC is set to < (Less Than)
    • JEQ : Jump if CC is set to = (EQuals to)
    • JGT : Jump if CC is set to > (Greater Than)

    在高階語言中,下列的敘述是: 如果 a > b,則向下執行 X 區段,否則跳到 Y 區段

    if (a > b) {
        /* ===== X 區段 ===== */
            ...
        /* ===== X 區段 ===== */
    } else {
        /* ===== Y 區段 ===== */
            ...
        /* ===== Y 區段 ===== */
    }

    但在低階語言中,JUMP 的指令 ( JLE , JGT , JLT ) 是滿足某些 condition 才會跳到指定區段,和高階語言的邏輯相反

    所以編譯器把高階語言轉換成低階語言時,會把上面那段程式碼中,X 和 Y 區段對調,以便用低階語言實作相同的邏輯

    即,如果 a > b,則跳到 X 區段,否則繼續下向執行 Y 區段

    COMP a, b
    JGT X
    /* ===== Y 區段 ===== */
        ...
    /* ===== Y 區段 ===== */
    
    /* ===== X 區段 ===== */
        ...
    /* ===== X 區段 ===== */
    
  • JSUB : jumping to a subroutine (function call)

  • RSUB : returning from a subroutine to the address contained in register L

# Input & Output

  • 在標準版本的 SIC 中,進行 I/O 時會從 regiter A最右的 8 個 bits 開始 (rightmost 8 bits),一次搬動一個 byte
  • Each device is assigned a unique 8-bit code
    • 10 進位表示: 0 ~ 255
    • 16 進位表示: 00 ~ FF
  • There are three I/O instructions:
    • Test Device: TD
    • Read Data: RD
    • Write Data: WD
  • On executing TD , the condition code is set to indicate the result of test
    • < means device is ready to send or receive
    • = means device is not ready

# SIC/XE Machine Architecture

# Memory 👍

  • 1 megabyte (2202^{20} bytes)
  • Larger memory \Rightarrow a change in instruction formats and addressing (需要更多的 bit 來表示記憶體位置)

# Registers

比標準版本增加以下四個暫存器:

  • B: base register for addressing,基底暫存器,用在定址
  • S: general working register
  • T: general working register
  • F: floating-point accumulator,浮點累加器,用於浮點運算 (48 bits)

General Working/Purpose Register

  • 沒有用在任何特殊目的的暫存器
  • 可以儲存任意資料
  • 用途由程式設計者自行決定

# Data Formats

整數、字元的資料格式和和標準 SIC 相同

In addition, there is a 48-bit floating point data type:

  • fraction (FF)
    • The fraction is interpreted as a value between 0 and 1
    • For normalized floating point number, the high-order bit (最左位) of the fraction must be 1
  • exponent: (ee)
    • the exponent is interpreted as an unsigned binary number between 0 and 2047
  • s: the sign of the value
    • 0: postive
    • 1: negative
  • The value of the floating-point number:

    (1)s×0.F×2(e1024)(-1)^s \times 0.F \times 2^{(e-1024)}

  • A value of 0 is represented by all 0
  • Bit string to value
    | s |   exponent    |                  fraction                    |
    +---+---------------+----------------------------------------------+
    | 0 | 100 0000 0010 | 1010 0000 0000 0000 0000 0000 0000 0000 0000 |
    
    • ss: 00 \Rightarrow postive
    • ee: 10261026
    • FF: (0.10100000...)2=21+23=0.625(0.1010\ 0000...)_2 = 2^{-1} + 2^{-3} = 0.625
    • The value of the floating point number\begin{split} &0.625 \times 2^{(1026-1024)} \\ =&0.625 \times 2^2 \\ =&2.5 \end{split}
  • Value to bit string
    • 7.625 轉浮點數表示
    • 轉二進制

      7.625=22+21+20+21+23=(111.101)27.625 = 2^2 + 2^1 + 2^0 + 2^{-1} + 2^{-3} = (111.101)_2

    • normalize

      (111.101)2=(0.111101)2×23(111.101)_2 = (0.111101)_2 \times 2^3

    • ss: Postive 0\Rightarrow 0
    • ff: 取小數點後的部分 111101000000...\Rightarrow 1111\ 0100\ 0000...
    • ee: 3+1024=1027100000000113 + 1024=1027 \Rightarrow 100\ 0000\ 0011
    | s |   exponent    |                  fraction                    |
    +---+---------------+----------------------------------------------+
    | 0 | 100 0000 0011 | 1111 0100 0000 0000 0000 0000 0000 0000 0000 |
    

# Instruction Formats

  • Format 1 (1 byte)

    • 不用指定運算元的指令,如特定暫存器 + 1
  • Format 2 (2 byte)

    • 指定暫存器 address 進行運算
  • Format 3 (3 byte)

    • 大多數指令為此格式
    • Relative addressing mode 一定是 format 3
    • Immediate/Directed addressing mode 時,如果數值 / 地址不超過 disp 的表示範圍 (12 bits),則使用 format 3

      disp: displacement
  • Format 4 (4 byte)

    • 在 Immediate/Directed addressing mode,如果數值 / 地址無法用 12-bits 表示,則會使用 format 4

Bit e is used to distinguish between Format 3 and 4

  • 0: format 3
  • 1: format 4

  • n: indirect addressing flag
  • i: immediate addressing flag
  • x: indexed addressing flag
  • b: base address relative flag
  • p: program counter relative flag
  • e: format 4 instruction flag

# Addressing Modes

Format 3

  • Two new relative addressing modes

    • Base relative: b=1, p=0, TA=(B)+disp

      • 0disp40950 \le \text{disp} \le 4095
    • Program-Counter relative: b=0, p=1, TA=(PC)+disp

      • 2048disp2047-2048 \le \text{disp} \le 2047 using 2’s complement notation
      • 因為程式執行可能會有像迴圈要往回跳的狀況,所以需要負數
  • Direct addressing: If b=0, p=0, TA=disp

Format 4

  • bits b and p are normally set to 0
    • Direct addressing: TA=address

Any of above addressing modes can also be combined with indexed addressing


Format 3/4 通用

  • If bit i=1, n=0, the target address itself is used as the operand value

    • Immediate addressing
    • 算出來的 TA 本身就是資料,而不是地址
    • 常用在資料是常數的時候
    • No memory reference is performed
  • If bit i=0, n=1, word at the location given by the target address is fetched

    • Indirect addressing
    • The value contained in this word is then taken as the address of the operand value
    • 類似 C 語言裡面的對指標取值
  • If bit i=0, n=0 or i=1, b=1, the target address is taken as the location of the operand

    • simple addressing
    • 最一般的使用方法,存放在記憶體 TA 位置的資料即運算元
    • SIC/XE instructions that specify neither immediate nor indirect addressing are assembled with bits n and i both set to 1. (for upward compatibility)
    • Assemblers for the standard version of SIC will set the bits in both of these positions to 0. This is because the 8-bit binary codes for all the SIC instructions end in 00.

For SIC/XE, if bits n and i are both 0, then bits b, p, and e are considered to be part of the address field of the instruction.

  • This makes Instruction Format 3 identical to the format used on the standard version of SIC, providing the desired compatibility.
  • n = 0; i = 0:
    • SIC/XE 為了向上相容,會改變為 SIC 的形式,此時的 b,p,e 會和 disp 合併為 15 個 bits
    • 因為會轉變為 SIC 的形式,所以 format 4 n, i 不會同時為 0
  • n = 1; i = 1:
    • 仍會去判斷 b,p,e 的樣式
  • Relative/Direct Addressing Mode 定義如何求出 TA
  • Immediate/Indirect/Simple Addressing Mode 定義求出 TA 後,如何得到運算元
  • 上面兩種模式是同時存在的,所以一共有 6 種組合,以下舉例三種
    • 一條 format3 指令是 Direct+Simple
      • TA = disp
      • 代表運算元是存在記憶體位置 disp 上的資料
    • 一條 format3 指令是 Direct+Immediate
      • TA = disp
      • 代表運算元是 disp
    • 一條 format3 指令是 Relative+Immediate
      • TA = (B) + disp
      • 代表運算元是 (B) + disp

# Instruction Set

  • Load/Store: LDB , STB , …
  • Floating-Point Number Arithmetic: ADDF , SUBF , MULF , DIVF
  • Register Move: RMO
  • Register-to-Register Arithmetic: ADDR , SUBR , MULR , DIVR
  • Supervisor call instruction: SVC
    • generating an interrupt that can be used for communication with the OS.

# Input and Output

  • There are I/O channels that can be used to perform input and output while the CPU is executing other instructions.
  • This allow overlap of computing and I/O, resulting in more efficient system operation.
  • Instructions:
    • Start I/O: SIO
    • Test I/O: TIO
    • Halt I/O: HIO

# SIC Programming Examples

# 附錄 A - 組合語言補充

# 範例程式分析

# CISC (複雜指令集架構)

  • CISC: Complex Instruction Set Computers
  • Implementation of such an architecture in hardware tends to be complex.

# VAX (Virtual Address eXtension) Architecture

# Memory

  • 8-bit bytes
  • Byte Addresses
  • Word: 2 bytes
  • Long Word: 4 bytes
  • Quadword: 8 bytes
  • Octaword: 16 bytes

Some operations are more efficient when operands are aligned in a particular way

  • For example, a longword operand that begins at a byte address that is a multiple of 4.

Virtual address space: 2322^{32} bytes

  • One half of the VAX virtual address space is called system space, which contains the operating system, and is shared by all programs.
  • The other half of the address space is called process space, and is defined separately for each program.

# Registers

  • 16 general-purpose registers: R0~R15, 32 bits in length.
  • R15: program counter
  • R14: stack pointer
  • R13: frame pointer
  • R12: argument pointer
  • Processor status longword (PSL), which contains state variables and flags associated with a process.

# Data Formats

  • 2’s complement representation is used for negative values.
  • Characters are stored using their 8-bit ASCII codes.
  • There are four different floating-point data formats, ranging in length from 4 to 16 bytes.
  • Packed decimal format: each byte represents two decimal digits.
  • Numeric format: one digit per byte.
  • Support queues and variable-length bit strings.

# Instruction Formats

  • A variable-length instruction format.
  • Each instruction consists of an operation code (1 or 2 bytes) followed by up to six operand specifiers.

# Addressing Modes

  • Immediate mode
  • Register mode
  • Register deferred mode
  • Autoincrement and autodecrement modes (register content)
  • Several base relative addressing modes
  • May also include an index register

# Instruction Set

  • Instruction mnemonics are formed by combining the following elements:
    1. A prefix that specifies the type of operation
    2. A suffix that specifies the data type of the operands
    3. A modifier (on some instructions) that gives the number of operands involved.
  • Examples:
    • ADDW2 : ADD Word, 2 operands
    • MULL3
    • CVTWL

There are number of operations that are much more complex than the machine instructions found on most computers.

These operations are hardware realizations of frequently occurring sequences of code. They are implemented as single instructions for efficiency and speed.

  • Load and store multiple registers
  • Manipulate queues and variable-length bit fields.
  • Calling and returning from procedures: a single instruction saves a designated set of registers, passes a list of arguments to the procedure, maintains the stack, frame, and argument pointers, and sets a mask to enable error traps for arithmetic operations.

# Input/Output

  • Each I/O device controller has a set of control/status and data registers, which are assigned locations in the physical address space.
  • The portion of the address space into which the device controller registers are mapped is called I/O space.
  • No special instructions are required to access registers in I/O space.
  • An I/O device driver issues commands to the device controller by storing values into the appropriate registers.
  • Software routines may read these registers to obtain status information.

# x86 Architecture

# Memory

  • At the physical level, memory consists of 8-bit bytes.
  • Byte addresses.
  • Word: two consecutive bytes
  • Double-word (dword): four bytes
  • Some operations are more efficient when operands are aligned in a particular way.
  • However, programmers usually view the x86 memory as a collection of segments.
    • From this point of view, an address consists of two parts
      • a segment number
      • an offset that points to a byte within the segment.
    • Segments can be of different sizes, and are often used for different purposes.
      • For example, code segment and data segment.
    • It is not necessary for all of the segments used by a program to be in physical memory.
    • A segment can also be divided into pages.

# Registers

  • Each general-purpose register is 32 bits long.
  • EIP: program counter.
  • FLAGS
  • 16-bit segment registers:
    • CS for code segment
    • SS for stack segment
    • DS, ES, FS, GS for data segments.
  • Floating-point unit (FPU) contains eight 80-bit data registers.
  • There are also a number of registers that are used only by system programs such as the operating system.

# Data Formats

  • Integers, floating-point values, characters, and strings.
  • The least significant part of a numeric value is stored at the lowest-numbered address: little-endian.
  • Unpacked and packed BCD formats.
  • Three different floating-point data formats:
    • single-precision with 32 bits (24+7+1)
    • double-precision with 64 bits (53+10+1)
    • extended-precision with 80 bits (64+15+1)

# Instruction Formats

  • Prefix: containing flags that modify the operation of the instruction.
    • For example, a repetition count of the instruction (e.g. string manipulation), specifying a segment register.
  • opcode: 1 or 2 bytes. Some operations have different opcodes, each specifying a different variant of the operation.
  • A number of bytes that specify the operands and addressing modes to be used.
  • The opcode is the only element that is always present in every instruction.

# Addressing Modes

  • Immediate mode: data in the instruction
  • Register mode: data in register
  • Direct mode: address of data in the instruction
  • Relative mode: displacement in the instruction, relative to EIP
    • TA=(base register) + (index register)*(scale factor) + displacement

# Instruction Set

  • More than 400 machine instructions.
  • An instruction may have 0~3 operands.
  • There are:
    • register-to-register instructions
    • register-to-memory instructions
    • memory-to-memory instructions
  • Some special-purpose instructions to perform operations frequently required in high-level programming languages
    • For example, entering and leaving procedures and checking subscript values against the bounds of an array.

# I/O

  • Repetition prefixes allow I/O instructions to transfer an entire string in a single operation.

# RISC Machines

  • Reduced Instruction Set Computer
  • This simplified design can result in
    • faster and less expensive processor development
    • greater reliability
    • faster instruction fetching, decoding, and execution times.
  • In general, a RISC system is characterized by
    • a standard, fixed instruction length
      • usually equal to one machine word
    • single-cycle execution of most instructions.
  • Memory access is usually done by load and store instructions only.
  • All instructions except for load and store are register-to-register operations.
  • There are typically a relatively large number of general-purpose registers.
  • The number of machine instructions, instruction formats, and addressing modes is relatively small.

# UltraSPARC (Scalable Processor ARChitecture) Architecture

# Memory

  • Halfword: two consecutive bytes
  • Word: four bytes
  • Doubleword: eight bytes
  • Data are stored in memory with alignment
  • Virtual address space of 2642^{64} bytes
  • The address space is divided into pages. Multiple page size are supported.

# Registers

  • A large register file that usually contains more than 100 general-purpose register.
  • However, any procedure can access only 32 registers.
  • The first eight of these registers are global — accessed by all procedures on the system.
  • Register r0 always contains the value zero.
  • The other 24 registers available to a procedure can be visualized as a window through which part of the register file can be seen.
  • These windows overlap, so some registers in the register file are shared between procedures. This facilitates the passing of parameters.
  • If a set of concurrently running procedures needs more windows than are physically available, a “window overflow” interrupt occurs. The operating system must then save the contents of some registes in the file (and restore them later) to provide the additional windows that are needed.

# Data Format

  • Integers, floating-point values, and characters.
  • Support both big-endian and little-endian byte orderings.
  • Three different floating-point data formats.

# Instruction Formats

  • There are three basic instruction formats. All of these formats are 32 bits long. The first 2 bits of the instruction word identify which format is being used.
  • Format 1 is used for the Call instruction.
  • Format 2 is used for branch instruction (and one special instruction that enters a value into a register)
  • Format 3 is used by the remaining instructions, including register loads and stores, and three-operand arithmetic operations.

# Addressing Modes

  • Immediate mode
  • Register direct mode
  • Operands in memory are addressd using one of the following three modes:
    • PC-relative: TA = (PC) + displacement {30 bits, signed}
    • Register indirect with displacement: TA = (register) + displacement {13 bits, signed}
    • Register indirect indexed: TA = (register1) + (register2)
  • PC-relative mode is used only for branch instructions.

# Instruction Set

  • The basic SPARC architechture has fewer than 100 machine instructions.
  • Instruction execution on a SPARC system is pipeline: fetching_and_decoding, execution
  • However, an ordinary branch instruction might cause the process to “stall”. The instruction following the branch would have to be dicarded without being executed.
  • To make the pipeline work more efficiently, SPARC branch instructions (including subroutine calls) are delayed branch. This means that the instruction immediately following the branch instruction is actually executed before the branch is taken.

# PowerPC Architecture

  • IBM first introduced the POWER architecture early in 1990 with the RS/6000.
  • POWER is an acronym for Performance Optimization With Enhanced RISC.
  • It was soon realized that this architecture could form the basis for a new family of powerful and low-cost microprocessors.
  • In October 1991, IBM, Apple, and Motorola formed an alliance to develop and market such microprocessors, which were named PowerPC.

# Memory

  • Two consecutive bytes: halfword
  • Four bytes: word
  • Eight bytes: doubleword
  • Sixteen bytes: quadword
  • Virtual address space: 2642^{64} bytes
  • The address space is divided into fixed-length segments: 256 megabytes long.
  • Each segment is divided into pages: 4096 bytes long.

# Registers

  • 32 general-purpose registers: 64 bits long.
  • 32 floating-point registers for FPU: 64 bits
  • A 32-bit condition register
  • Link Register (LR) and Count Register (CR) used by some branch instructions

# Data formats

  • Integers are stored as 8-, 16-, 32-, or 64-bit binary numbers.
  • Big-endian byte ordering by default.
  • It is possible to select little-endian byte ordering by setting a bit in a control register.
  • There are two different floating-point data formats.
  • Single-precision format is 32 bits long (23+8+1)
  • Double-precision format is 64 bits long (52+11+1)

# Instruction format

  • There are seven basic instruction formats, some of which have subforms.
  • All of these formats are 32 bits long
  • Instructions must be aligned beginning at a word boundary
  • The first 6 bits of the instruction word always specify the opcode; some instruction formats also have an additional “extended code” field.
  • The fixed instruction length in the PowerPC architecture is typical of RISC systems, making instruction decoding faster and simpler than on CISC systems.

# Addressing modes

  • Immediate mode
  • Register direct mode
  • The only instructions that address memory are load and store operations, and branch instructions.
  • Load and store operations use one of the following three addressing modes.
  • Register indirect mode: TA=(register)
  • Register indirect with index: TA=(register1)+(register2)
  • Register indirect with immediate index: TA=(register)+displacement
  • The register numbers and displacement are encoded as part of the instruction.
  • Branch instructions use one of the following three addressing modes.
  • Absolute: TA = actual address
  • Relative: TA = current instruction address + displacement
  • Link Register: TA = (LR)
  • Count Register: TA = (CR)

# Instruction set

  • Approximately 200 machine instructions
  • Floating-point “multiply and add” instructions
  • Using more powerful instructions, so fewer instructions are required to perform a task vs. keeping instructions simple so they can be executed as fast as possible.
  • Instruction execution on a PowerPC system is pipelined. However, the pipelining is more sophisticated than on the original SPARC systems, with branch prediction used to speed execution. As a result, the delayed branch technique we described for SPARC is not used on PowerPC and most other modern architectures.

# Input and Output

  • Segments in the virtual address space are mapped onto an external address space (typically an I/O bus)

# Cray T3E Architecture

  • Supercomputer
  • A massively parallel processing (MPP) system
  • For use on technical applications in scientific computing.
  • Containing a large number of processing elements (PE), arranged in a three-dimensional network as illustrated in Fig. 1.8.
    • The interconnection network is circular in each dimension.
  • Each PE consists of a DEC Alpha EV5 RISC microprocessor, local memory, and performance-accelerating control logic developed by Cray.
  • A T3E system may contain from 16 to 2048 processing elements.

# Memory

  • Each PE in the T3E has its own local memory with a capacity of from 64 megabytes to 2 gigabytes.
  • The local memory within each PE is part of a physically distributed, logically shared memory system.
  • Two consecutive bytes: word
  • Four bytes: long word
  • Eight bytes: quadword
  • 64-bit virtual address

# Registers

  • 32 general-purpose registers: 64 bits long
  • 32 floating-point registers: 64 bits long
  • 64-bit program counter

# Data formats

  • Integers are stored as longwords or quadwords
  • There are two different types of floating-point data formats. One group of three formats is included for compatibility with the VAX architecture. The other group consists of four IEEE standard formats, which are compatible with those used on most modern systems.
  • There are no byte load or store operations. Only longwords and quadwords can be transferred between a register and memory. As a consequence, characters that are to be manipulated separately are usually stored one per longword.

# Instruction formats

  • There are five basic instruction formats, some of which have subforms.
  • All of these formats are 32 bits long.
  • The first 6 bits of the instruction word always specify the opcode; some instruction formats also have an additional “function” field.

# Addressing modes

  • Immediate mode
  • Register direct mode
  • PC-relative: TA = (PC) + displacement (for conditional and unconditional branches)
  • Register indirect with displacement: TA = (register) + displacement (for load and store operations and for subroutine jumps)

# Instruction Set

  • Approximately 130 machine instructions
  • The instruction set is designed so that an implementation of the architecture can be as fast as possible. For example, there are no byte or word load and store instructions. This means that the memory access interface does not need to include shift-and-mask operations.

# Input and Output

  • Performing I/O through multiple ports into one or more I/O channels.
  • These channels are integrated into the network that interconnects the processing nodes.
  • A system may be configured with up to one I/O channel for every eight PEs.
  • All channels are accessible and controllable from all PEs.