- The clock cycle time (CCT) is the time for one clock period (usually of the processor clock, which runs at a constant rate, usually published as part of the documentation for a computer)
- note: Although clock cycle time has traditionally been fixed, to save energy or temporarily boost performance, today’s processors can vary their clock rates, so we would need to use the average clock rate for a program.
- The clock rate (CR) is the inverse of the clock cycle time
- (usually measured in or its multiples)
Single-Cycle
-
The response time (or execution time) is the total time required for the computer to complete a task (including disk accesses, memory accesses, I/O activities, operating system overhead, CPU execution time, etc.)
- The latency is the time it takes to complete an individual instruction
- (note: the latency can refer to (i) the number of stages in a pipeline. or (ii) The number of stages between two instructions during execution)
- todo latency = execution time ?
- The latency is the time it takes to complete an individual instruction
-
The performance is the reciprocal of response time:
-
The CPU (execution) time (of task) is the actual time the CPU spends computing for a specific task (excluding other activities)
-
The throughput (or bandwidth) is the number of tasks (instructions) completed per unit time
-
- The instruction count (IC) is the number of instructions executed by the program
- The clock cycles per instruction (CPI) is the average number of clock cycles per instruction for a program or program fragment
- The CPU clock cycles (or total clock cycles) is the total number of clock cycles consumed by the program
-
The CPU (execution) time (of program) is
-
- is the number of instructions of type
- is the average number of clock cycles per instruction of type
The number of instructions in the program (IC) is determined by the efficiency of the algorithm implementation, the compiler, and the processor’s instruction set architecture (ISA). The implementation of the processor determines both the clock cycle time and the CPI.
Instruction Replacement
EXAMPLE
Given
- Some of the instructions in the program are replaced by instructions, after which, the program gets shorter by
- Each two instructions are replaced by five instructions.
Question: How many instructions were replaced?
Answer:
- 2 instructions has
- 5 instructions has
- Therefore, by replacing 2 instructions with 5 instructions, we save , thus, by replacing one instruction, we save .
- The cycles saved =
- The number of instructions replaced =
- Every instructions of type replaced by instructions of type
- is the time saved (in )
- is the clock rate of the CPU (in )
- and are the cycles per instruction for and (resp.)
- is the total number of instructions replaced
Instruction Count
EXAMPLE
Given
- CPU A with , and
- CPU B with , and
- A program with instructions in running time (after assembly for CPU A)
How many instructions (in running time) would CPU B execute to match the running time of CPU A?
- (ET=Execution Time)
Pipelining
In this section, , therefore,
- The pipeline depth is the number of stages () in the pipeline
- are the stage delay in the pipeline
- $\displaystyle\mathrm{Latency_{pipelined}}=\frac{\text{Execution-time}} {\text{Number-of-instructions}}$ #todo is it correct
Although the latency is worse in the pipelined processor, the throughput is significantly improved
-
-
- is the speedup of the pipelined processor over the single-cycle processor, where:
- and are the time taken to execute some given number of instructions, for the single-cycle and pipelined processors, respectively.
- is the speedup of the pipelined processor over the single-cycle processor, where:
-
When the stages are perfectly balanced, then:
- , thus, (Under ideal conditions and with a large number of instructions)
Exercise
- Given the following times for each one of 5 stages of the pipeline: (assume )
- A. What is the clock cycle time (single-cycle / pipelined)?
- B. What is the latency of
lw
instruction (single-cycle / pipelined)?- C. For a large number of instructions, what is the speedup of the pipelined processor over the single-cycle processor?
- D. If it is possible to split one stage into two stages, each taking half the time of the original stage.
- What is the best choice of stage to split?
- What would be the new clock cycle time of the pipelined processor?
- What would be the new latency of the
lw
instruction?- How does the change affect the throughput?
Answer
- (given that there is no stalls)
- The best choice to split is the longest stage, which is the
MEM
stage with , that will be split into two stages each taking , thus the new longest stage will be theID
stage with , and the new clock cycle time will be , the new latency will be , the throughput will be improved as the clock cycle time is reduced.- The speedup of the pipelined processor (with the split stage) over the single-cycle processor is , and over the pipelined processor (without the split stage) is
Speedup
- The speedup is it is the improvement in speed of execution of a task executed on two similar architectures with different resources
- Speedup can be defined for two different types of quantities: latency and throughput
Amdahl’s Law
- is the number of processors
- is the fraction of the program that can be parallelized
- is the fraction of the program that must be executed sequentially
- is the speedup of the program when executed on processors
- and are the execution times of the program before and after the improvement (resp.)