The message-passing version of TPHOT was implemented on Lawrence Livermore National Laboratory's 128 processor BBN Butterfly TC2000 using the Livermore Message-Passing (LMPS), a library of message-passing routines. Each processor of the BBN has 16 MBytes of memory that can be ``shared'' by all nodes via a ``butterfly switch''. Under LMPS, however, each node's memory belongs to only itself from the perspective of the application program. The code yielded identical results for the test problem run with 8 tasks on both the BBN and the Cray. Many different runs were made on the BBN, varying the number of processors from 1 to 116 and the number of particles (i.e., the workload W) from 2400 to 24,000,000.
Table 5.1 gives the simulation times for the Butterfly as a function of the 
number 
| 1|cNumber | 1|c | 1c Workload (W) | 1c | 1c | 1c| | 
| 1|cof | 1|c0.01 | 1|c 0.1 | 1|c1.0 | 1|c10.0 | 1|c| 100.0 | 
| 1|cprocessors | 1|ctime | 
1|c time | 
1|ctime | 
1|ctime | 
1|c|time | 
| 1|c | 1|c(sec) | 
1|c (sec) | 
1|c(sec) | 
1|c(sec) | 
1|c| (sec) | 
| 1 | 17 | 
144 | 
1407 | 
- | 
- | 
| 4 | 6 | 
38 | 
357 | 
- | 
- | 
| 8 | 5 | 
22 | 
181 | 
1769 | 
- | 
| 9 | 5 | 
20 | 
161 | 
1595 | 
- | 
| 10 | 5 | 
18 | 
145 | 
1416 | 
- | 
| 16 | 7 | 
13 | 
94 | 
888 | 
- | 
| 32 | 15 | 
15 | 
54 | 
450 | 
- | 
| 64 | - | 
31 | 
53 | 
251 | 
2364 | 
| 80 | - | 
- | 
- | 
223 | 
1813 | 
| 100 | - | 
- | 
- | 
215 | 
1493 | 
| 116 | - | 
- | 
- | 
224 | 
1366 | 
of processors N and the workload W. We have arbitrarily assigned W=1.0 to the case with approximately 240,000 particles. Blanks appear in the table for two reasons: (1) large workloads are prohibitively expensive on few processors, and (2) small workloads on a large number of processors yield chaotic timings.
The speedups for each case in table 5.1 are computed using equation (5.1),
using the N=1 
case for
each workload as the reference serial case (for 
).  This is not 
quite correct,
because this will not be the optimal serial code.  This is probably not a 
large
effect, but it will tend to make the speedups appear better than they should be.
| 1|cWorkload | 1|c# of | 1|c model single | 1|cobserved single | 1|c|serial | 
| 1|c(W) | 1|chistories | 1|c processor execution | 1|c processor execution | 1|c|fraction | 
| 1|c | 1|c(Nh) | 1|c time ( | 
1|c
time ( | 
1|c|(f) | 
| 0.01 | 2347 | 17.2 | 17 | 0.19 | 
| 0.10 | 23843 | 143.8 | 144 | 0.023 | 
| 1.00 | 238232 | 1407 | 1407 | 0.0024 | 
| 10.0 | 2382320 | 14070 | - | 0.00024 | 
| 100.0 | 23823200 | 140700 | - | 0.000024 |