How can we decide which API, MPI, Pthreads, or OpenMP is best for our applica-tion? In general, there are many factors to consider, and the answer may not be at all clear cut. However, here are a few points to consider.
As a first step, decide whether to use distributed-memory, or shared-memory. In order to do this, first consider the amount of memory the application will need. In general, distributed-memory systems can provide considerably more main memory than shared-memory systems, so if the memory requirements are very large, you may need to write the application using MPI.
If the problem will fit into the main memory of your shared-memory sys-tem, you may still want to consider using MPI. Since the total available cache on a distributed-memory system will probably be much greater than that avail-able on a shared-memory system, it’s conceivable that a problem that requires lots of main memory accesses on a shared-memory system will mostly access cache on a distributed-memory system, and, consequently, have much better overall performance.
However, even if you’ll get a big performance improvement from the large aggre-gate cache on a distributed-memory system, if you already have a large and complex serial program, it often makes sense to write a shared-memory program. It’s often possible to reuse considerably more serial code in a shared-memory program than a distributed-memory program. It’s more likely that the serial data structures can be easily adapted to a shared-memory system. If this is the case, the development effort for the shared-memory program will probably be much less. This is especially true for OpenMP programs, since some serial programs can be parallelized by simply inserting some OpenMP directives.
Another consideration is the communication requirements of the parallel algo-rithm. If the processes/threads do little communication, an MPI program should be fairly easy to develop, and very scalable. At the other extreme, if the process-es/threads need to be very closely coordinated, a distributed-memory program will probably have problems scaling to large numbers of processes, and the performance of a shared-memory program should be better.
If you decided that shared-memory is preferable, you will need to think about the details of parallelizing the program. As we noted earlier, if you already have a large, complex serial program, you should see if it lends itself to OpenMP. For exam-ple, if large parts of the program can be parallelized with parallel for directives, OpenMP will be much easier to use than Pthreads. On the other hand, if the program involves complex synchronization among the threads—for example, read-write locks or threads waiting on signals from other threads—then Pthreads will be much easier to use.