Home | | **An Introduction to Parallel Programming** | | **Multi - Core Architectures and Programming** | Matrix-Vector Multiplication

Let’s take a look at writing a Pthreads matrix-vector multiplication program. Recall that if A = (aij) is an mx n matrix and x = (x0, x1, ...., xn-1)T is an n-dimensional column vector, then the matrix-vector product Ax = y is an m-dimensional column vector, y = (y0, y1, …., ym-1)T in which the ith component yi is obtained by finding

**MATRIX-VECTOR MULTIPLICATION**

Let’s take a look at writing a Pthreads
matrix-vector multiplication program. Recall that if *A* ** =** (

We want to parallelize this by dividing the
work among the threads. One pos-sibility is to divide the iterations of the
outer loop among the threads. If we do this, each thread will compute some of
the components of y. For example, suppose that *m* *=**n* ** =** 6 and the number of threads, thread
count or

To compute y[0], thread 0 will need to execute the code

y[0] = 0.0;

**for **(j = 0; j < n; j++)** **

y[0] += A[0][j] *x[j];

Thread 0 will therefore need to access every
element of row 0 of A and every element of x. More generally, the thread that has been assigned y[i] will need to execute the code

y[i] = 0.0;

**for **(j = 0; j < n; j++)** **y[i] += A[i][j]* x[j];

Thus, this thread will need to access every
element of row i of A and every element of x. We see that each thread needs to access every component of x, while each thread only needs to access its assigned rows of A and assigned components of y. This suggests that, at a minimum, x should be shared. Let’s also make A and y shared. This might seem to violate our
principle that we should only make variables global that need to be global.
However, in the exercises, we’ll take a closer look at some of the issues
involved in making the A and y variables local to the thread function, and
we’ll see that making them global can make good sense. At this point, we’ll
just observe that if they are global, the main thread can easily initialize all
of A by just reading its entries from stdin, and the product vector y can be easily printed by the main thread.

Having made these decisions, we only need to
write the code that each thread will use for deciding which components of y it will compute. In order to simplify the code, let’s assume that
both *m* and *n* are evenly divisible by *t*. Our example with *m = *6 and

With these formulas, we can write the thread
function that carries out matrix-vector multiplication. See Program 4.2. Note
that in this code, we’re assuming that A, x, y, m, and n are all global and shared.

**Program 4.2: **Pthreads matrix-vector multiplication

void Pth_mat_vect(void* rank) { long my_rank =
(long) rank; int i, j;

int local_m = m/thread_count;

int my_first_row = my_rank *local_m;

int my_last_row = (my_rank+1)*local_m-1;

for (i = my_first_row; i <= my last_row;
i++) {

y[i] = 0.0;

for (j = 0; j < n; j++)

y[i] += A[i][j]*x[j];

}

return NULL;

} /* Pth_mat_vect */

If you have already read the MPI chapter, you
may recall that it took more work to write a matrix-vector multiplication
program using MPI. This was because of the fact that the data structures were
necessarily distributed, that is, each MPI process only has direct access to
its own local memory. Thus, for the MPI code, we need to explicitly *gather* all of x into each process’ memory. We see from this
example that there are instances in which writing shared-memory programs is
easier than writing distributed-memory programs. However, we’ll shortly see
that there are situations in which shared-memory programs can be more complex.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

**Related Topics **

Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.