The theoretical speedup for the BBN as a function of N and W can be computed using the calculated values for the serial fraction, f, given in table 5.2. When this is done, the results compare quite poorly with the observed speedups tabulated in table 5.1. This indicates that the parallelization overhead cannot be ignored, and one needs to include a non-zero in the equation for speedup i.e., equation (5.5). Let us start with equation (5.5), and assume that can be modeled as follows:
(118) |
(119) |
This expression for speedup is used to find values of that would force the predicted speedups to match the observed speedups given in table 5.1. These ``required'' values of show a trend in somewhat independent of W but nearly linear with N, the number of processors. Therefore, was modeled as follows:
(120) |
This expression is linear with N and the constant 0.0006 appears due to the fact that the required was approximately 0.0006 for N=116, somewhat independent of W. Of course, this implies that the overhead term is not linear in N but quadratic:
(121) |
This O(N2) dependence may be due to memory contention resulting from the BBN implementation of the message passing system, LMPS, which relies on the shared memory features of the BBN.
This expression for in equation (5.17) was then used in equation (5.16)
to predict
the speedups for the five different workloads and the number of processors
varying from 1 to 116.
The results are shown in figure 5.3. The plots include the observed speedups
from
table 5.1 for comparison. It is clear that this simple model, which uses
two empirical
parameters (f and ), predicts the behavior of the BBN very well
over a wide
range of numbers of processors and workloads.