Glossary
1BP: 1-bit branch
predictor
4 C's - compulsory
Misses: the first time a block is accessed by the cache 4
C's - capacity Misses: blocks must be evicted due to the size of the cache.
4 C's - coherence Miss:
processors
are accessing the same block. Processor A writes to the block. Even
though Processor B has the block in its cache, it is a miss, because the block
is no longer up-to-date.
4 C's - conflict
Misses: associated with set associative and direct mapped
caches - another data address needs the cache block and must replace the
data currently in the cache.
ALAT: advance load
table - stores advance information about load operations
aliasing: in
the BTB, when two addresses overlap with the same BTB entry, this is called aliasing.
Aliasing should be kept to <1%.
ALU: arithmetic logic
unit
AMAT: average memory
access time
AMAT: Average Memory
Access Time = hit time + miss rate * miss penalty
Amdahl's Law: an
equation to determine the improvement of a system when only a portion of the
system is improved.
architectural
registers: registers (Floating point and General
Purpose) that are visible to the programmer.
ARF: architectural
register file or retirement register file
Asynchronous Message
Passing: a processor requests data, then continues processing
instructions while message is retrieved.
BHT: branch history
table - records if branch was taken or not taken.
blocking cache: the
cache services only one block at a time, blocking all other requests BTB: branch
target buffer - keeps track of what address was taken last time the processor
encountered this instruction.
cache coherence
definition #1: Definition #1 - A read R from address X
on processor P1 returns the value written by the most recent write W to
X on P1 if no other processor has written to X between W and R.
cache coherence
definition #2: Definition #2 - If P1 writes to X and P2
reads X after a sufficient time, and there are no other writes to X in
between, P2’s read returns the value written by P1’s write.
cache coherence
definition #3: Definition #3 - Writes to the same
location are serialized:two writes to location X are seen in the same
order by all processors.
cache hit: desired
data is in the cache and is up-to-date cache miss: desired data is not
in the cache or is dirty
cache thrashing: when
two or more addresses are competing for the same cache block. The processor
is requesting both addresses, which results in each access evicting the
previous access. CDB: common data bus
check pointing: store
the state of the CPU before a branch is taken. Then if the branch is a misprediction,
restore the CPU to correct state. Don't store to memory until it is determined
this is the correct branch.
CISC Processor: complex
instruction set CMP: chip multiprocessor
coarse multi-threading:
the
thread being processed changes every few clock cycles consistency: order
of access to different addresses
control hazard: branching
and jumps cannot be executed until the destination address is known CPI: cycle
per instruction
CPU: central
processing unit
Dark Silicon: the
gap between how many transistors are on a chip and how many you can use simultaneously.
The simultaneous usage is determined by the power consumption of the chip. data
hazard: the order of the program is changed which results in data commands
being out of order, if the instructions are dependent - then there is a
data hazard.
DDR SDRAM: double
data rate synchronous dynamic RAM dependency chain: long series of
dependent instructions in code
directory protocols: information
about each block state in the caches is stored in a common directory.
DRAM: dynamic random
access memory
DSM: distributed
shared memory - all processors can access all memory locations Enterprise
class: used for large scale systems that service enterprises
error: defect
that results in failure
error forecasting: estimate
presence, creation, and consequences of errors error removal: removing
latent errors by verification
exclusion property: each
cache level will not contain any data held by a lower level cache explicit
ILP: compiler decides which instruction to execute in parallel
failure: the
cause of an error
fault avoidance: prevent
an occurrence of faults by construction
fault tolerance: prevent
faults from becoming failures through redundancy faults: actual behavior
deviates from specified behavior
FIFO: first in first
out
fine multi-threading: the
thread being processed changes every cycle FLOPS: floating point
operations per second
Flynn's Taxonomy: classifications
of parallel computer architecture, SISD, SIMD, MISD, MIMD
FPR: floating
point register FSB: front side bus
Geometric Mean: the
nth root of the product of the numbers global miss rate: (the # of L2
misses)/(# of all memory misses) GPR: general purpose register
hit latency: time
it takes to get data from cache. Includes the time to find the address in the
cache and load it on the data lines
ilp: instruction
level programming
inclusion property: each
level of cache will include all data from the lower level caches IPC: instructions
per cycle
Iron Law: execution
time is the number of executed instructions N (write N in in the ExeTime for
Single-Cycle), times the CPI (write x1), times the clock cycle time (write 2ns)
so we get N2ns (write =N2ns) for single-cycle.
Iron Law: instructions
per program depends on source code, compiler technology, and ISA. CPI depends
upon the ISA and the micro architecture. Time per cycle depends upon the micro
architecture and the base technology.
iron law of computer
performance: relates cycles per instruction,
frequency and number of instructions to computer performance
ISA: instruction set
architecture
Itanium architecture: an
explicit ILP architecture, six instructions can be executed per clock cycle
Itanium Processor: Intel
family of 64-bit processors that uses the Itanium architecture LFU: least
frequently used
ll and sc: load
link and store conditional, a method using two instructions ll and sc for
ensuring synchronization.
local miss rate: #
of L2 misses/ # of L1 misses
locality principle: things
that will happen soon are likely to be similar to things that just happened.
loop interchange: used
for nested loops. Interchange the order of the iterations of the loop, to make
the accesses of the indexes closer to what is actually the layout in memory
LRU: least
recently used LSQ: load store queue
MCB: memory
conflict buffer - "Dynamic Memory Disambiguation Using the Memory Conflict
Buffer", see also "Memory Disambiguation"
MEOSI Protocol: modified-exclusive-owner-shared-invalid
protocol, the states of any cached block.
MESI Protocol: modified-exclusive-shared-invalid
protocol, the states of any cached block. Message Passing: a processor
can only access its local memory. To access other memory locations is
must send request/receive messages for data at other memory locations. meta-predictor:
a predictor that chooses the best branch predictor for each branch.
MIMD: multiple instruction stream, multiple data streams
MISD: multiple
instruction streams, single data stream
miss latency: time
it takes to get data from main memory. This includes the time it takes to check
that it is not in the cache and then to determine who owns the data, and then
send it to the CPU.
mobo: mother board
Moore's Law: Gordon
E. Moore observed the number of transistors on an integrated circuit board
doubles every two years.
MP: multiprocessing
MPKI: Misses per Kilo
Instruction
MSI Protocol: modified-shared-invalid
protocol, the states of any cached block. MTPI: message transfer part
interface
MTTF: mean
time to failure MTTR: mean time to repair
multi-level caches: caches
with two or more levels, each level larger and slower than the previous
level
mutex variable: mutually
exclusive (mutex), a low level synchronization mechanism. A thread acquires
the variable, then releases it upon completion of the task. During this period
no other thread can acquire the mutex.
NMRU: not most
recently used
non-blocking caches: if
there is a miss, the cache services the next request while waiting for memory
NUMA: non-uniform
memory access, also called a distributed shared memory OOO: out of order
OS: operating system
PAPT: physically
addressed, physically tagged cache - the cache stores the data based on its physcial
address
PC: program counter
PCI: peripheral
component interconnect
Pentium Processor: x86
super scalar processor from Intel
physical registers: registers,
FP and GP that are not visible to the programmer pipeline burst cache:
pipelined cache: a
pipelined burst cache uses 3 clock cycles to transfer the first data set from a
cache block, then 1 clock cycle to transfer each of the rest. The pipeline
and the 'burst'. (3-1-1-1) PIPT: physically indexed, physically tagged
cache.
Power: Power
= 1/2C V^2 * f Alpha
Power Architecture: performance
optimization with enhanced RISC
Power vs Performance Equation:
pre-fetch buffer: when
getting data from memory, get all the data in the row and store it in a buffer.
pre-fetching cache: instructions
are fetched from memory before they are needed by the cpu " Prescott
Processor: Based on the Netburst architecture. It has a 31 stage pipeline
in the core. The high penatly paid for mispredictions is supposedly
offset with a Rapid Execution Engine. It also has a trace execution cache, this
stores decoded instructions and then reuses them instead of fetching and
decoding again.
PRF: physical
register file
pseudo associative
cache: an address is first searched in 1/2 of the cache. If
it is not there, then it is searched in the other half of the cache.
RAID: redundant array
of independent disks
RAID 0: strips
of data are stored on disks - alternating between disks. Each disk supplies a
portion of the data, which usually improves performance.
RAID 1: the
data is replicated on another disk. Each disk contains the data. Which ever
disk is free responds to the read request. The write request is written
to one disk and then mirrored to the other disk(s).
RAID 2 and RAID 3: the
data is striped on disks and Hamming codes or parity bits are used for error
detection. RAID 2 and RAID 3 are not used in any current application
RAID 4: Data
is striped in large blocks onto disks with a dedicated parity disk. It is used
by the NetApp company.
RAID 5: Data
is striped in large blocks onto disks, but there is no dedicated parity disk.
The parity for each block is stored on one of the data blocks.
RAR: read
after read RAS: return address stack RAT: register alias table
RAT: *(another
RAT in multiprocessing) register allocation table RAW: read after write
RDRAM: direct
random access memory
relaxed consistency: some
instructions can be performed ooo and still maintain consistency
reliability: measure of continuous service accomplishment
reservation stations: function
unit buffers
RETO: return from
interrupt
RF: register file
RISC Processor: reduced
instruction set - simple instructions of the same size. Instructions are executed
in one clock cycle
ROB: re-order
buffer RS: reservation station
RWX: read - write-
execute permissions on files
SHARC processor: floating
point processors designed for DSP applications SIMD: singe instruction
stream, multiple data streams
simultaneous
multi-threading: instructions from different threads are
processed, even in the same cycle
SISD: single
instruction stream , single data stream SMP: symmetric multiprocessing
SMT: simultaneous
multi threading
snooping protocols: A
broadcast network - caches for each processor watch the bus for addresses
in their cache.
SPARC processor: Scalable
Processor Architecture -a RISC instruction set processor
spatial locality: if
we access a memory location, nearby memory locations have a tendency to be accessed
soon.
Speedup: how
much faster a modified system is compared to the unmodified system. SPR: special
purpose registers - such as program counter, or status register
SRAM: static random
access memory
structural hazard: the
pipeline contains two instructions attempting to access the same resource.
super scalar
architecture: the processor manages instruction
dependencies at run-time. Executes more than one instruction per clock
cycle using pipelines.
synchronization: "a
system is sequentially consistent if the result of any execution is the same
as if the operations of all the processors were executed in some sequential
order, and the operations for each individual processor appear in the order
specified by the program." Quote by Leslie Lamport
Synchronous Message
Passing: a processor requests data then waits until the data
is received before continuing.
tag: the
part of the data address that is used to find the data in the cache. This
portion of the address is unique so that it can be distinguished from
other lines in the cache.
temporal locality: if
a program accesses a memory location, it tends to access the same location again
very soon.
TLB: translation
look aside buffer - a cache of translated virtual memory to physical memory addresses.
TLB misses are very time consuming
Tomasulo's Algorithm: achieve
high performance without using special compilers by using dynamic
scheduling
tournament predictor: a
meta-predictor
trace caches: sets
of instructions are stored in a separate cache. These are instructions that
have been decoded and executed. If there is a branch in the set, only
the taken branch instructions are kept. If there is a misprediction the trace
stops.
trace scheduling: rearranging
instructions for faster execution, the common cases are scheduled tree, tournament,
dissemination barriers: types of structures for barriers
UMA: uniform memory
access - all memory locations have similar latencie.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.