Home | | Advanced Computer Architecture | SMT and CMP Architectures

Chapter: Advanced Computer Architecture : Multi-Core Architectures

SMT and CMP Architectures

Chip-level multiprocessing(CMP or multicore): integrates two or more independent cores(normally a CPU) into a single package composed of a single integrated circuit(IC), called a die, or more dies packaged, each executing threads independently.

SMT and CMP Architectures


Instruction-level parallelism(ILP)


Ø Wide-issue Superscalar processors  (SS)


Ø Four or more instruction per cycle

Ø Executing a single program or thread

Ø Attempts to find multiple instructions to issue each cycle.


Ø Out-of-order execution => instructions are sent to execution units based on instruction dependencies rather than program order


Thread-level parallelism(TLP)


Ø Fine-grained multithreaded superscalars(FGMS)


Ø Contain hardware state for several threads

Ø Executing multiple threads

Ø On any given cycle a processor executes instructions from one  of  the threads


Ø Multiprocessor(MP)


Ø Performance improved by adding more CPUs


Simultaneous Multithreading


The idea is issue multiple instructions from multiple threads each cycle


The Features  are


Ø Fully exploit thread-level parallelism and instruction-level parallelism.


Ø Multiple functional units


Ø Modern processors have more functional units available then a single thread can utilize.


Ø Register renaming and dynamic scheduling


Ø Multiple instructions from independent threads can co-exist and co-execute.


Superscalar processor with no multithreading:


Only one thread is processed in one clock cycle


Ø Use of issue slots is limited by a lack of ILP.


Ø Stalls such as an instruction cache miss leaves the entire processor idle.


Fine grained Multithreading


Switches threads on every clock cycle


Ø Pro: hide latency of from both short and long stalls


Ø Con: Slows down execution of the individual threads ready to go. Only one thread issues inst. In a given clock cycle.


Course-grained multithreading:


Switches threads only on costly stalls  (e.g., L2 stalls)


Ø Pros: no switching each clock cycle, no slow down for ready-to-go threads. Reduces no of completely idle clock cycles.


Ø Con: limitations in hiding shorter stalls


Simultaneous Multithreading:


Exploits TLP at the same time it exploits ILP with multiple threads using the issue slots in a single-clock cycle.


Ø  issue slots is limited by the following factors:


Ø Imbalances in the resource needs.

Ø Resource availability over multiple threads.

Ø Number of active threads considered.

Ø Finite limitations of buffer.

Ø Ability to fetch enough instructions from multiple threads.


Ø Practical limitations of what instructions combinations can issue from one thread and multiple threads.


Performance Implications of SMT


Ø    Single thread performance is likely to go down (caches, branch predictors, registers, etc. are shared) – this effect can be mitigated by trying to prioritize one thread

Ø    While fetching instructions, thread priority can dramatically influence total throughput – a widely accepted heuristic (ICOUNT): fetch such that each thread has an equal share of processor resources


Ø    With eight threads in a processor with many resources, SMT yields throughput improvements of roughly 2-4


Ø    Alpha 21464 and Intel Pentium 4 are examples of SMT


Effectively Using Parallelism on a SMT Processor


Instruction Throughput executing a parallel workload



Comparison of SMT vs Superscalar


SMT processors are compared to base superscalar processors in several key measures :

Ø Utilization of functional units.

Ø Utilization of fetch units.

Ø Accuracy of branch predictor.

Ø Hit rates of primary caches.

Ø Hit rates of secondary caches.


Performance improvement:


Ø Issue slots.


Ø Funtional units.


Ø Renaming registers.


1. CMP Architecture


Ø Chip-level multiprocessing(CMP or multicore): integrates two or more independent cores(normally a CPU) into a single package composed of a single integrated circuit(IC), called a die, or more dies packaged, each executing threads independently.


Ø Every funtional units of a processor is duplicated.


Ø Multiple processors, each with a full set of architectural resources, reside on the same die


Ø Processors may share an on-chip cache or each can have its own cache



Ø Examples: HP Mako, IBM Power4

Ø Challenges: Power, Die area (cost)


Single core computer


Chip Multithreading


Chip  Multithreading = Chip Multiprocessing + Hardware  Multithreading.


Ø Chip Multithreading is the capability of a processor to process multiple s/w threads simulataneous h/w threads of execution.


Ø CMP is achieved by multiple cores on a single chip or multiple threads on a single core.


Ø CMP processors are especially suited to server workloads, which generally have high levels of Thread-Level Parallelism(TLP).


CMP’s Performance


Ø CMP’s are now the only way to build high performance microprocessors , for a variety of reasons:


Ø Large uniprocessors are no longer scaling in performance, because it is only possible to extract a limited amount of parallelism from a typical instruction stream.


Ø Cannot simply ratchet up the clock speed on today’s processors,or the power dissipation will become prohibitive.


Ø CMT processors support many h/w strands through efficient sharing of on-chip resources such as pipelines, caches and predictors.


Ø CMT processors are a good match for server workloads,which have high levels of TLP and relatively low levels of ILP.



Ø The performance race between SMT and CMP is not yet decided.

Ø CMP is easier to implement, but only SMT has the ability to hide latencies.


Ø A functional partitioning is not exactly reached within a SMT processor due to the centralized instruction issue.


Ø A separation of the thread queues is a possible solution, although it does not remove the central instruction issue.

Ø A combination of simultaneous multithreading with the CMP may be superior.


Ø Research : combine SMT or CMP organization with the ability to create threads with compiler support of fully dynamically out of a single thread.

Ø Thread-level speculation

Ø Close to multiscalar

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
Advanced Computer Architecture : Multi-Core Architectures : SMT and CMP Architectures |

Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.