Multi - Core Architectures and Programming - CS8083, CS6801 Anna University - Important Questions Answers, Question Paper, Lecture Notes, Study Material

An Introduction to Parallel Programming by Peter S Pacheco

Chapter 1 : Why Parallel Computing

Why Parallel Computing?
Why We Need Ever-Increasing Performance
Why We�re Building Parallel Systems
Why we Need to Write Parallel Programs
How Do We Write Parallel Programs?
Concurrent, Parallel, Distributed

Chapter 2 : Parallel Hardware and Parallel Software

Parallel Hardware and Parallel Software
Some Background: von Neumann architecture, Processes, multitasking, and threads
Modifications to the Von Neumann Model
Parallel Hardware
Parallel Software
Input and Output
Performance of Parallel Programming
Parallel Program Design with example
Writing and Running Parallel Programs
Assumptions - Parallel Programming

Chapter 3 : Distributed Memory Programming with MPI

Distributed-Memory Programming with MPI
The Trapezoidal Rule in MPI
Dealing with I/O
Collective Communication
MPI Derived Datatypes
Performance Evaluation of MPI Programs
A Parallel Sorting Algorithm

Chapter 4 : Shared Memory Programming with Pthreads

Shared-Memory Programming with Pthreads
Processes, Threads, and Pthreads
Pthreads - Hello, World Program
Matrix-Vector Multiplication
Critical Sections
Busy-Waiting
Mutexes
Producer-Consumer Synchronization and Semaphores
Barriers and Condition Variables
Read-Write Locks
Caches, Cache Coherence, and False Sharing
Thread-Safety
Shared-Memory Programming with OpenMP
The Trapezoidal Rule
Scope of Variables
The Reduction Clause
The parallel For Directive
More About Loops in Openmp: Sorting
Scheduling Loops
Producers and Consumers
Caches, Cache Coherence, and False Sharing
Thread-Safety
Parallel Program Development
Two n-Body Solvers
Parallelizing the basic solver using OpenMP
Parallelizing the reduced solver using OpenMP
Evaluating the OpenMP codes
Parallelizing the solvers using pthreads
Parallelizing the basic solver using MPI
Parallelizing the reduced solver using MPI
Performance of the MPI solvers
Tree Search
Recursive depth-first search
Nonrecursive depth-first search
Data structures for the serial implementations
Performance of the serial implementations
Parallelizing tree search
A static parallelization of tree search using pthreads
A dynamic parallelization of tree search using pthreads
Evaluating the Pthreads tree-search programs
Parallelizing the tree-search programs using OpenMP
Performance of the OpenMP implementations
Implementation of tree search using MPI and static partitioning
Implementation of tree search using MPI and dynamic partitioning
Which API?

Multicore Application Programming For Windows Linux and Oracle Solaris by Darryl Gove

Chapter 1 : Hardware and Processes and Threads

Hardware, Processes, and Threads
Examining the Insides of a Computer
The Motivation for Multicore Processors
Supporting Multiple Threads on a Single Chip
Increasing Instruction Issue Rate with Pipelined Processor Cores
Using Caches to Hold Recently Used Data
Using Virtual Memory to Store Data
Translating from Virtual Addresses to Physical Addresses
The Characteristics of Multiprocessor Systems
How Latency and Bandwidth Impact Performance
The Translation of Source Code to Assembly Language
The Performance of 32-Bit versus 64-Bit Code
Ensuring the Correct Order of Memory Operations
The Differences Between Processes and Threads

Chapter 2 : Coding for Performance

Coding for Performance
Defining Performance
Understanding Algorithmic Complexity
Why Algorithmic Complexity Is Important
Using Algorithmic Complexity with Care
How Structure Impacts Performance
Performance and Convenience Trade-Offs in Source Code and Build Structures
Using Libraries to Structure Applications
The Impact of Data Structures on Performance
The Role of the Compiler
The Two Types of Compiler Optimization
Selecting Appropriate Compiler Options
How Cross-File Optimization Can Be Used to Improve Performance
Using Profile Feedback
How Potential Pointer Aliasing Can Inhibit Compiler Optimizations
Identifying Where Time Is Spent Using Profiling
Commonly Available Profiling Tools
How Not to Optimize
Performance by Design

Chapter 3 : Identifying Opportunities for Parallelism

Identifying Opportunities for Parallelism
Using Multiple Processes to Improve System Productivity
Multiple Users Utilizing a Single System
Improving Machine Efficiency Through Consolidation
Using Containers to Isolate Applications Sharing a Single System
Hosting Multiple Operating Systems Using Hypervisors
Using Parallelism to Improve the Performance of a Single Task
One Approach to Visualizing Parallel Applications
How Parallelism Can Change the Choice of Algorithms
Amdahl�s Law
Determining the Maximum Practical Threads
How Synchronization Costs Reduce Scaling
Parallelization Patterns
Data Parallelism Using SIMD Instructions
Parallelization Using Processes or Threads
Multiple Independent Tasks
Multiple Loosely Coupled Tasks
Multiple Copies of the Same Task
Single Task Split Over Multiple Threads
Using a Pipeline of Tasks to Work on a Single Item
Division of Work into a Client and a Server
Splitting Responsibility into a Producer and a Consumer
Combining Parallelization Strategies
How Dependencies Influence the Ability Run Code in Parallel
Antidependencies and Output Dependencies
Using Speculation to Break Dependencies
Critical Paths
Identifying Parallelization Opportunities

Chapter 4 : Synchronization and Data Sharing

Synchronization and Data Sharing
Data Races
Using Tools to Detect Data Races
Avoiding Data Races
Synchronization Primitives
Mutexes and Critical Regions
Spin Locks
Semaphores
Readers-Writer Locks
Barriers
Atomic Operations and Lock-Free Code
Deadlocks and Livelocks
Communication Between Threads and Processes
Storing Thread-Private Data

Chapter 5 : Using POSIX Threads

Using POSIX Threads
Creating Threads
Compiling Multithreaded Code
Process Termination
Sharing Data Between Threads
Variables and Memory
Multiprocess Programming
Sockets
Reentrant Code and Compiler Flags
Windows Threading

Chapter 6 : Windows Threading

Creating Native Windows Threads
Terminating Threads
Creating and Resuming Suspended Threads
Using Handles to Kernel Resources
Methods of Synchronization and Resource Sharing
An Example of Requiring Synchronization Between Threads
Protecting Access to Code with Critical Sections
Protecting Regions of Code with Mutexes
Slim Reader/Writer Locks
Signaling Event Completion to Other Threads or Processes
Wide String Handling in Windows
Creating Processes
Sharing Memory Between Processes
Inheriting Handles in Child Processes
Naming Mutexes and Sharing Them Between Processes
Communicating with Pipes
Communicating Using Sockets
Atomic Updates of Variables
Allocating Thread-Local Storage
Setting Thread Priority

Chapter 7 : Using Automatic Parallelization and OpenMP

Using Automatic Parallelization and OpenMP
Using Automatic Parallelization to Produce a Parallel Application
Identifying and Parallelizing Reductions
Automatic Parallelization of Codes Containing Calls
Assisting Compiler in Automatically Parallelizing Code
Using OpenMP to Produce a Parallel Application
Using OpenMP to Parallelize Loops
Runtime Behavior of an OpenMP Application
Variable Scoping Inside OpenMP Parallel Regions
Parallelizing Reductions Using OpenMP
Accessing Private Data Outside the Parallel Region
Improving Work Distribution Using Scheduling
Using Parallel Sections to Perform Independent Work
Nested Parallelism
Using OpenMP for Dynamically Defined Parallel Tasks
Keeping Data Private to Threads
Controlling the OpenMP Runtime Environment
Waiting for Work to Complete
Restricting the Threads That Execute a Region of Code
Ensuring That Code in a Parallel Region Is Executed in Order
Collapsing Loops to Improve Workload Balance
Enforcing Memory Consistency
An Example of Parallelization

Chapter 8 : Hand Coded Synchronization and Sharing

Hand-Coded Synchronization and Sharing
Atomic Operations
Using Compare and Swap Instructions to Form More Complex Atomic Operations
Enforcing Memory Ordering to Ensure Correct Operation
Compiler Support of Memory-Ordering Directives
Reordering of Operations by the Compiler
Volatile Variables
Operating System�Provided Atomics
Lockless Algorithms
Dekker�s Algorithm
Producer-Consumer with a Circular Buffer
Scaling to Multiple Consumers or Producers
Scaling the Producer-Consumer to Multiple Threads
Modifying the Producer-Consumer Code to Use Atomics
The ABA Problem

Chapter 9 : Scaling with Multicore Processors

Scaling with Multicore Processors
Constraints to Application Scaling
Hardware Constraints to Scaling
Bandwidth Sharing Between Cores
False Sharing
Cache Conflict and Capacity
Pipeline Resource Starvation
Operating System Constraints to Scaling
Multicore Processors and Scaling

Chapter 10 : Other Parallelization Technologies

Other Parallelization Technologies
GPU-Based Computing
Language Extensions
Alternative Languages
Clustering Technologies
Transactional Memory
Vectorization

Multi - Core Architectures and Programming - CS8083, CS6801

Important questions and answers, Question Paper download, Online Study Material, Lecturing Notes, Assignment, Reference, Wiki

2 marks Questions and Answers

Exam Question Papers

Important Questions

Question Bank