Efficiency of Parallel Algorithm

Next: Parallelization on the CRAY Up: Parallel Photon Transport Algorithm Previous: Parallel Photon Transport Algorithm

Efficiency of Parallel Algorithm

The principal measure of parallelization efficiency is the speedup, S_N, defined to be the ratio of the time to execute the computational workload W on a single processor to the time on N processors,

$\begin{displaymath} {S_N} = {{\tau_1}\over{\tau_N}}\;\;\;,\end{displaymath}$ (104)

where $\tau_1$ is the time to execute the workload on a single processor and $\tau_N$ is the time to execute the workload on N processors.

The time $\tau_N$ is the sum of three terms,

$\begin{displaymath} {\tau_N} = {\tau_s} + {\tau_p} + {\Delta\tau}\;\;\;:\end{displaymath}$ (105)

(1) the time $\tau_s$ to execute the serial portion of the workload, (2) the time $\tau_p$ to execute the parallel portion, and (3) an additional time $\Delta\tau$ due to the parallelization overhead which is quite general and accounts for any overhead due to implementing the algorithm on the parallel processor, either due to the hardware, the network, the operating system, or the algorithm. We would expect it to be a function of the number of processors, N, as well as the workload, W,

$\begin{displaymath} {\Delta\tau} = {\Delta\tau}(N,W)\;\;\;.\end{displaymath}$ (106)

The theoretical speedup, S_N^th, is the speedup assuming zero parallelization overhead,

$\begin{displaymath} {S_N^{th}} = {{\tau_1}\over{{\tau_p}+{\tau_s}}}\;\;\;,\end{displaymath}$ (107)

which can be compared to the true speedup:

$\begin{displaymath} {S_N} = {{\tau_1}\over{{\tau_p}+{\tau_s}+{\Delta\tau}}}\;\;\;.\end{displaymath}$ (108)

Note that the theoretical speedup is the best that can be achieved. The ratio of the true speedup to the theoretical speedup is the parallelization efficiency,

$\begin{displaymath} {{\epsilon}_N} = {{S_N}\over{S_N^{th}}}\;\;\;,\end{displaymath}$ (109)

which is a measure of the efficiency of the parallel processor to execute a given parallel algorithm. Any degradation in performance due to parallelization overhead will result in $\epsilon_N$ being less than one.

Now we introduce the serial fraction f,

$\begin{displaymath} f = {{W_S}\over{W}} = {{W_S}\over{{W_S}+{W_P}}}\;\;\;,\end{displaymath}$ (110)

where W is the total computational workload, consisting of the serial workload W_S, and the parallel workload W_p, which can by definition be performed on N processors. We define W in arbitrary units of workload and assume the processor speed is $\nu$ (workload unit per second), which leads to the following expression for the theoretical speedup,

$\begin{displaymath} S_N^{th} = {{\tau_1}\over{\tau_p + \tau_s}} = {{W/{\nu}}\over{{{(1-f)W}\over{N\nu}}+{{fW}\over{\nu}}}}\;\;\;,\end{displaymath}$ (111)

or

$\begin{displaymath} S_n^{th} = {{1}\over{{(1-f)}\over{N}}+f}\;\;\;.\end{displaymath}$ (112)

For Monte Carlo [Mar93], the total workload for the single processor cases (N=1) can be approximated very well (as will be seen by our timing results) by a simple linear function of the number of histories, N_h, to be simulated:

$\begin{displaymath} W = a' + b'N_h\;\;\;.\end{displaymath}$ (113)

Now we divide equation (5.10) by $\nu$ , which is a constant for a specific processor and define new constants a and b to obtain the single processor execution time:

$\begin{displaymath} \tau_1 = a+bN_h\;\;\;.\end{displaymath}$ (114)

Since all histories are independent, they can be tracked in parallel. Further, it is reasonable to associate the constant a with the serial time, $\tau_s$ . The serial fraction is given by,

$\begin{displaymath} f = {{\tau_s}\over{\tau_1}} = {{a}\over{\tau_1}}\;\;\;.\end{displaymath}$ (115)

Therefore, one can find the constants a and b by observing at least two different workloads (i.e. different N_h) for the N=1 (single processor) case.

Next: Parallelization on the CRAY Up: Parallel Photon Transport Algorithm Previous: Parallel Photon Transport Algorithm

Amitava Majumdar
9/20/1999