1. What do you think are the reasons for the increasing importance of
multi-processors?
2. Define
process and thread in the context of multi-processors.
With an
MIMD, each processor is executing its own instruction stream. In many cases,
each processor executes a different process. A process is a segment of code
that may be run independently; the state of the process contains all the
information necessary to execute that program on a processor. In a
multiprogrammed environment, where the processors may be running independent
tasks, each process is typically independent of other processes. It is also
useful to be able to have multiple processors executing a single program and
sharing the code and most of their address space. When multiple processes share
code and data in this way, they are often called threads. Today, the term
thread is often used in a casual way to refer to multiple loci of execution
that may run on different processors, even when they do not share an address
space. For example, a multithreaded architecture actually allows the simultaneous
execution of multiple processes, with potentially separate address spaces, as
well as multiple threads that share the same address space.
3. What do you understand by
grain size?. What is it’s impact on
parallelism?
Although
the amount of computation assigned to a thread, called the grain size, is
important in considering how to exploit thread-level parallelism efficiently,
the important qualitative distinction from instruction-level parallelism is
that thread-level parallelism is identified at a high level by the software
system and that the threads consist of hundreds to millions of instructions
that may be executed in parallel.
Threads
can also be used to exploit data-level parallelism, although the overhead is
likely to be higher than would be seen in an SIMD computer. This overhead means
that grain size must be sufficiently large to exploit the parallelism
efficiently. For example, although a vector processor (see Appendix F) may be
able to efficiently parallelize operations on short vectors, the resulting
grain size when the parallelism is split among many threads may be so small
that the overhead makes the exploitation of the parallelism prohibitively
expensive.
4. Why
Symmetric Shared memory architecture is called as UMA?
Because
there is a single main memory that has a symmetric relationship to all
processors and a uniform access time from any processor, these multiprocessors
are most often called symmetric (shared-memory) multiprocessors (SMPs), and
this style of architecture is sometimes called uniform memory access (UMA),
arising from the fact that all processors have a uniform latency from memory,
even if the memory is organized into multiple banks. This type of symmetric
shared-memory architecture is currently by far the most popular organization.
5. List the two models available
for communication in multi-processing environment.
Shared
memory and Message passing multiprocesors
6. What are the challenges in parallel processing?
The first
hurdle has to do with the limited parallelism available in programs, and the
second arises from the relatively high cost of communications. Limitations in
available parallelism make it difficult to achieve good speedups in any
parallel processor.
The
second major challenge in parallel processing involves the large latency of
remote access in a parallel processor. In existing shared-memory
multiprocessors, communication of data between processors may cost anywhere
from 50 clock cycles (for multicores) to over 1000 clock cycles (for
large-scale multiprocessors), depending on the communication mechanism, the
type of interconnection network, and the scale of the multiprocessor. The
effect of long communication delays is clearly substantial.
7. What do you understand by Cache coherence
Problem? Give an example.
Unfortunately,
caching shared data introduces a new problem because the view of memory held by
two different processors is through their individual caches, which, without any
additional precautions, could end up seeing two different values. Figure 4.3
illustrates the problem and shows how two different processors can have two
different values for the same location. This difficulty is generally referred
to as the cache coherence problem.
8. When can we say that the memory is coherent in a
multi-processor system?
A memory
system is coherent if
1. A read by
a processor P to a location X that follows a write by P to X, with no writes of
X by another processor occurring between the write and the read by P, always
returns the value written by P.
2. A read by
a processor to location X that follows a write by another processor to X
returns the written value if the read and write are sufficiently separated in
time and no other writes to X occur between the two accesses.
3. Writes to
the same location are serialized; that is, two writes to the same location by
any two processors are seen in the same order by all processors. For example,
if the values 1 and then 2 are written to a location, processors can never read
the value of the location as 2 and then later read it as 1.
The first
property simply preserves program order—we expect this property to be true even
in uniprocessors. The second property defines the notion of what it means to
have a coherent view of memory: If a processor could continuously read an old
data value, we would clearly say that memory was incoherent.
9. Why serialization of reads and
writes are important in an multi-processor environment?
The need
for write serialization is more subtle, but important in an multi-processor
environment. Suppose we did not serialize writes, and processor PI writes
location X followed by P2 writing location X. Serializing the writes ensures
that every processor will see the write done by P2 at some point. If we did not
serialize the writes, it might be the case that some processor could see the
write of P2 first and then see the write of PI, maintaining the value written
by PI indefinitely. The simplest way to avoid such difficulties is to ensure
that all writes to the same location are seen in the same order; this property
is called write serialization.
10. Define memory coherence and consistency
properties. Why are they important?
Informally,
we could say that a memory system is coherent if any read of a data written
value of that data item. This definition, although intuitively appealing, is
vague and simplistic; the reality is much more complex. This simple definition
contains two different aspects of memory system behavior, both of which are
critical to writing correct shared-memory programs. The first aspect, called
coherence, defines what values can be returned by a read. The second aspect,
called consistency, determines when a written value will be returned by a read.
Coherence
and consistency are complementary: Coherence defines the behavior of reads and
writes to the same memory location, while consistency defines the behavior of
reads and writes with respect to accesses to other memory locations. For now,
make the following two assumptions. First, a write does not complete (and allow
the next write to occur) until all processors have seen the effect of that
write. Second, the processor does not change the order of any write with
respect to any other memory access. These two conditions mean that if a
processor writes location A followed by location B, any processor that sees the
new value of B must also see the new value of A. These restrictions allow the
processor to reorder reads, but forces the processor to finish a write in
program order.
11. List the two protocols used to track the status
of the shared data block. How the status is maintained in both the schemes?
The
protocols to maintain coherence for multiple processors are called cache
coherence protocols. Key to implementing a cache coherence protocol is tracking
the state of any sharing of a data block. There are two classes of protocols,
which use different techniques to track the sharing status, in use:
a.
Directory based—The sharing status of a block of
physical memory is kept in just one location, called the directory;
Directory-based coherence has slightly higher implementation overhead than
snooping, but it can scale to larger processor counts. The Sun Tl design, uses
directories, albeit with a central physical memory.
Snooping—Every
cache that has a copy of the data from a block of physical memory also has a
copy of the sharing status of the block, but no centralized state is kept. The
caches are all accessible via some broadcast medium (a bus or switch), and all
cache controllers monitor or snoop on the medium to determine whether or not
they have a copy of a block that is requested on a bus or switch access.
12. What do you understand by write update
protocol?
The
alternative to an invalidate protocol is to update all the cached copies of a
data item when that item is written. This type of protocol is called a write
update or write broadcast protocol. Because a write update protocol must
broadcast all writes to shared cache lines, it consumes considerably more
bandwidth. For this reason, all recent multiprocessors have opted to implement
a write invalidate protocol.
13. Which protocol is more suited for distributed
shared memory architecture with large number of processors. Why?
The
protocols to maintain coherence for multiple processors are called cache
coherence protocols. Key to implementing a cache coherence protocol is tracking
the state of any sharing of a data block. There are two classes of protocols,
which use different techniques to track the sharing status, in use:
a.
Directory based—The sharing status of a block of
physical memory is kept in just one location, called the directory;
Directory-based coherence has slightly higher implementation overhead than
snooping, but it can scale to larger processor counts. The Sun Tl design, uses
directories, albeit with a central physical memory.
b. Snooping—Every
cache that has a copy of the data from a block of physical memory also has a
copy of the sharing status of the block, but no centralized state is kept. The
caches are all accessible via some broadcast medium (a bus or switch), and all
cache controllers monitor or snoop on the medium to determine whether or not
they have a copy of a block that is requested on a bus or switch access.
14.What is multi threading?
Multithreading
allows multiple threads to share the functional uits of the single processor in
an overlapping fashion.
15. What is fine grained multithreading?
It
switches between threads on each instruction, causing the execution of multiple
threads to be interleaved.
16. What is coarse grained multithreading?
It
switches threads only on costly stalls. Thus it is much less likely to slow
down the execution of an individual thread.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.