Multicore Application Programming For Windows, Linux, and Oracle Solaris

Online Study Material, Lecturing Notes, Assignment, Reference, Wiki and important questions and answers

Multicore Application Programming For Windows, Linux, and Oracle Solaris

Chapter 1 Hardware and Processes and Threads

=> Hardware, Processes, and Threads
=> Examining the Insides of a Computer
=> The Motivation for Multicore Processors
=> Supporting Multiple Threads on a Single Chip
=> Increasing Instruction Issue Rate with Pipelined Processor Cores
=> Using Caches to Hold Recently Used Data
=> Using Virtual Memory to Store Data
=> Translating from Virtual Addresses to Physical Addresses
=> The Characteristics of Multiprocessor Systems
=> How Latency and Bandwidth Impact Performance
=> The Translation of Source Code to Assembly Language
=> The Performance of 32-Bit versus 64-Bit Code
=> Ensuring the Correct Order of Memory Operations
=> The Differences Between Processes and Threads

Chapter 2 Coding for Performance

=> Coding for Performance
=> Defining Performance
=> Understanding Algorithmic Complexity
=> Why Algorithmic Complexity Is Important
=> Using Algorithmic Complexity with Care
=> How Structure Impacts Performance
=> Performance and Convenience Trade-Offs in Source Code and Build Structures
=> Using Libraries to Structure Applications
=> The Impact of Data Structures on Performance
=> The Role of the Compiler
=> The Two Types of Compiler Optimization
=> Selecting Appropriate Compiler Options
=> How Cross-File Optimization Can Be Used to Improve Performance
=> Using Profile Feedback
=> How Potential Pointer Aliasing Can Inhibit Compiler Optimizations
=> Identifying Where Time Is Spent Using Profiling
=> Commonly Available Profiling Tools
=> How Not to Optimize
=> Performance by Design

Chapter 3 Identifying Opportunities for Parallelism

=> Identifying Opportunities for Parallelism
=> Using Multiple Processes to Improve System Productivity
=> Multiple Users Utilizing a Single System
=> Improving Machine Efficiency Through Consolidation
=> Using Containers to Isolate Applications Sharing a Single System
=> Hosting Multiple Operating Systems Using Hypervisors
=> Using Parallelism to Improve the Performance of a Single Task
=> One Approach to Visualizing Parallel Applications
=> How Parallelism Can Change the Choice of Algorithms
=> Amdahl’s Law
=> Determining the Maximum Practical Threads
=> How Synchronization Costs Reduce Scaling
=> Parallelization Patterns
=> Data Parallelism Using SIMD Instructions
=> Parallelization Using Processes or Threads
=> Multiple Independent Tasks
=> Multiple Loosely Coupled Tasks
=> Multiple Copies of the Same Task
=> Single Task Split Over Multiple Threads
=> Using a Pipeline of Tasks to Work on a Single Item
=> Division of Work into a Client and a Server
=> Splitting Responsibility into a Producer and a Consumer
=> Combining Parallelization Strategies
=> How Dependencies Influence the Ability Run Code in Parallel
=> Antidependencies and Output Dependencies
=> Using Speculation to Break Dependencies
=> Critical Paths
=> Identifying Parallelization Opportunities

Chapter 4 Synchronization and Data Sharing

=> Synchronization and Data Sharing
=> Data Races
=> Using Tools to Detect Data Races
=> Avoiding Data Races
=> Synchronization Primitives
=> Mutexes and Critical Regions
=> Spin Locks
=> Semaphores
=> Readers-Writer Locks
=> Barriers
=> Atomic Operations and Lock-Free Code
=> Deadlocks and Livelocks
=> Communication Between Threads and Processes
=> Storing Thread-Private Data

Chapter 5 Using POSIX Threads

=> Using POSIX Threads
=> Creating Threads
=> Compiling Multithreaded Code
=> Process Termination
=> Sharing Data Between Threads
=> Variables and Memory
=> Multiprocess Programming
=> Sockets
=> Reentrant Code and Compiler Flags
=> Windows Threading

Chapter 6 Windows Threading

=> Creating Native Windows Threads
=> Terminating Threads
=> Creating and Resuming Suspended Threads
=> Using Handles to Kernel Resources
=> Methods of Synchronization and Resource Sharing
=> An Example of Requiring Synchronization Between Threads
=> Protecting Access to Code with Critical Sections
=> Protecting Regions of Code with Mutexes
=> Slim Reader/Writer Locks
=> Signaling Event Completion to Other Threads or Processes
=> Wide String Handling in Windows
=> Creating Processes
=> Sharing Memory Between Processes
=> Inheriting Handles in Child Processes
=> Naming Mutexes and Sharing Them Between Processes
=> Communicating with Pipes
=> Communicating Using Sockets
=> Atomic Updates of Variables
=> Allocating Thread-Local Storage
=> Setting Thread Priority

Chapter 7 Using Automatic Parallelization and OpenMP

=> Using Automatic Parallelization and OpenMP
=> Using Automatic Parallelization to Produce a Parallel Application
=> Identifying and Parallelizing Reductions
=> Automatic Parallelization of Codes Containing Calls
=> Assisting Compiler in Automatically Parallelizing Code
=> Using OpenMP to Produce a Parallel Application
=> Using OpenMP to Parallelize Loops
=> Runtime Behavior of an OpenMP Application
=> Variable Scoping Inside OpenMP Parallel Regions
=> Parallelizing Reductions Using OpenMP
=> Accessing Private Data Outside the Parallel Region
=> Improving Work Distribution Using Scheduling
=> Using Parallel Sections to Perform Independent Work
=> Nested Parallelism
=> Using OpenMP for Dynamically Defined Parallel Tasks
=> Keeping Data Private to Threads
=> Controlling the OpenMP Runtime Environment
=> Waiting for Work to Complete
=> Restricting the Threads That Execute a Region of Code
=> Ensuring That Code in a Parallel Region Is Executed in Order
=> Collapsing Loops to Improve Workload Balance
=> Enforcing Memory Consistency
=> An Example of Parallelization

Chapter 8 Hand Coded Synchronization and Sharing

=> Hand-Coded Synchronization and Sharing
=> Atomic Operations
=> Using Compare and Swap Instructions to Form More Complex Atomic Operations
=> Enforcing Memory Ordering to Ensure Correct Operation
=> Compiler Support of Memory-Ordering Directives
=> Reordering of Operations by the Compiler
=> Volatile Variables
=> Operating System–Provided Atomics
=> Lockless Algorithms
=> Dekker’s Algorithm
=> Producer-Consumer with a Circular Buffer
=> Scaling to Multiple Consumers or Producers
=> Scaling the Producer-Consumer to Multiple Threads
=> Modifying the Producer-Consumer Code to Use Atomics
=> The ABA Problem

Chapter 9 Scaling with Multicore Processors

=> Scaling with Multicore Processors
=> Constraints to Application Scaling
=> Hardware Constraints to Scaling
=> Bandwidth Sharing Between Cores
=> False Sharing
=> Cache Conflict and Capacity
=> Pipeline Resource Starvation
=> Operating System Constraints to Scaling
=> Multicore Processors and Scaling

Chapter 10 Other Parallelization Technologies

=> Other Parallelization Technologies
=> GPU-Based Computing
=> Language Extensions
=> Alternative Languages
=> Clustering Technologies
=> Transactional Memory
=> Vectorization

Copyright © 2018-2020; All Rights Reserved. Developed by Therithal info, Chennai.