Authors
Dr G.F. Carey: project director, professor in Aerospace Engineering at
the University of Texas at Austin, director of the CFDLab.
Dr R. Mclay: Research Associate at the University of
Texas at Austin.
Dr. C. Harlé:
Post Doctoral associate, CFDLab.
Dr. B. Davis: Post Doctoral associate, CFDLab.
Dr. H Swinney: Professor at the University of Texas at Austin,
director of the Center for Nonlinear Dynamics.
Dr. S. Van Hook:
Post Doctoral associate, Center for Nonlinear Dynamics.
MGFLO is designed for parallel distributed computation of 3D
coupled viscous flow and heat transfer, including thermocapillary
surface tension effects. Of specific interest are microgravity fluid
problems of interest to NASA for life support fluid processing and
storage, and manufacturing applications in the Aerospace Shuttle and
Space Station Missions. The project involves development of software
for flow simulations on the Cray T3D and Cray T3E distributed MPP
systems, as well as fluid flow experiments on fluid films where the
Marangoni effect is dominant. The current (first generation) MGFLO
code achieved a sustained 16.5 Gflops/s performance level on the 512
node T3D system at NASA Goddard in March 1997, for a representative 3D
coupled Rayleigh- Benard- Marangoni benchmark calculation. A brief
description of the problem class, methodology, performance measurement
criteria, and software package is provided in this document.
1. Rayleigh- Benard- Marangoni Flows
1.1 Microgravity Flow Problems
Benard in 1900 observed that cellular flow structures formed under certain conditions where a fluid layer was heated from below [1]. He attributed the behavior to the effect of thermal expansion and associated buoyancy forces in the fluid, and assumed that there were no significant effects associated with the free surface. Subsequently, Rayleigh in 1916, developed a mathematical model for this problem under the same assumptions and carried out a stability analysis. This class of problems is commonly referred to as the Rayleigh- Benard Problem.
Neither Rayleigh nor Benard realized that, in fact, thermocapillary surface tension at the free surface was the dominant mechanism driving the flow for this class of terrestrial experiments involving thin layers. This problem is termed the Rayleigh- Benard- Marangoni problem [2]. In fact, it is the dominance of surface tension over gravity that makes this problem so important to understanding the behavior of fluid in a microgravity environment [3]. Clearly, in NASA shuttle and space station orbit, gravity is negligible, and surface tension effects are paramount. However, it is generally not possible to carry out terrestrial experiments to test ideas before deployment in space (the one exception being the thin layer type of experiments conducted as part of the present work). Hence a simulation capability for Rayleigh- Benard- Marangoni problems is essential.
Since these problems are nonlinear 3D, coupled and require fine grid resolution for stable, reliable simulations, the problem is both computationally and memory intensive. Table 1 gives an estimate of solution times and memory requirements for a typical Rayleigh- Benard- Marangoni calculation using MGFLO on different Cray vector systems, and on the NASA Goddard MPP systems.
Computer |
J90 1 proc |
C90 1 proc |
T90 1 proc |
T3D 512 proc |
T3E 512 proc |
|---|---|---|---|---|---|
| MGFLO Flops/s | 73 M | 290 M | 600 M | 16.5 G | 53.0 G |
| Time: 9000 elements (Memory: 128 MW) |
9 days |
2.3 days |
1.1 day |
1 h |
20 mn |
| Time: 110582 elements (Memory: 1540 MW) |
Out of Memory |
Out of Memory |
Out of Memory |
In Memory |
In Memory |
MGFLO time and memory comparison
1.2 Mathematical Model and Finite Element Method
The model consists of the Navier-Stokes equations for flow in 3D, coupled with the energy transport equation for temperature. Under the Boussinesq approximation, buoyancy enters as a body force in the momentum equations, and a thermocapillary surface tension effect enters through a shear stress condition at the free surface [see 12(postscript hyperlinked), 13].
A weak variational saddle point formulation of the viscous flow and heat transfer problem is constructed, and a Galerkin finite element approximation statement follows with the free surface condition incorporated as a natural boundary condition [4]. The discretization is taken as tensor-product hexahedral elements with triquadratic basis for each velocity component, trilinear basis for the pressure, and triquadratic basis for temperature (see online papers linked to references [5,6]).
The resulting semidiscrete system is integrated implicitly using
a Crank-Nicholson (midstep) rule, and the resulting nonlinear
algebraic system for each timestep is solved using block iteration
with biconjugate gradient (BCG) iterative
solution of each resulting linear algebraic system. Convergence of the
inner linear BCG solve is based of the usual residual-type criterion
[7,8].
The problem is solved in parallel over a partition of subdomains using an
element-by-element strategy
for the BCG solver within each subdomain
[9]. That is,
the mesh of hexahedral elements is partitioned to subdomains containing
contiguous blocks of elements. Each such subdomain is assigned to a processor so
this defines the processor partitioning. The subdomain partition is
non-overlapping insofar as elements are concerned, but nodes, edges and faces on
the subdomain interfaces are shared by adjacent processors. Communication
between adjacent processors will be required for calculations involving those
elements with common nodes on an interface. By further grouping the processor
elements, it is possible to overlap computation and communication. For further
details, see the cited references [5
(hyper-linked)],
[10,11].
The computationally intensive parts are the element calculations, the matrix-vector products, and the dot products. The element calculations can be made in parallel and run on the T3D at 24 Mflops/s per processor. The EBE scheme circumvents global assembly and recasts the global matrix-vector products at the element level. This implies that we make local "almost dense" element matrix-vector products, for the element's interior to any given processor subdomain. Load balance is achieved if the number of elements per processor subdomain is approximately constant.
Clearly, the BLAS2 matrix-vector routine is heavily used (requiring more than 80% of the CPU time for a typical calculation) [11]. In our optimized code, this runs at 34 Mflops/s for a single processor on the T3D. Near linear speedup and scalability on 512 nodes of the T3D at 96.7% were achieved in this manner for the test problem described later.
2. Evaluation Criteria and Performance
The norm for evaluating the speed of the code was specified under the NASA contract in terms of Gigaflops/s with a minimum sustained speed of 10 Gigaflops/s to be attained on the representative 3D coupled problem as the first major performance milestone. This goal was met in March 1997 and verified independently at NASA Goddard, where the performance was assessed to exceed 16 Gigaflops/s on 512 nodes of the NASA Goddard Cray T3D.
3. MGFLO Software Package
3.1 Code Modules
The Code is written in C and Fortran. Communications between
processors is achieved using MPI and the Cray shared memory library
(shmem).
The software consists of the following modules.
(1) A simple 3D mesh generator (IJK type) for hexahedral elements
and a similar processor partitioning strategy. That is, the
connectivity table is also provided by the module.
(2) A solution manager module which handles the options specified
in the input file needed to compute the element matrices and vectors
for the discretized problem. At each timestep, or nonlinear iteration
cycle, the solution manager calls the routine to form the element
matrices and vectors and applies the essential boundary
conditions. The solver module is then called.
(3) The solver module consists of element-by-element matrix vector
products with overlapped communications and associated dot products,
also recast at the element level. A special masked componentwise
vector product is introduced in this latter calculation.
The interface between the mesh generator and the other two
modules is a "struct" in C which specifies the nodes coordinates,
boundary conditions and connectivity tables.
The structure of the main program then is:
Standard Headers
Read Title Routine
Read in materials
Flop Rate Routine
main(int argc, char **argv)
Main Definitions
Initializations
Parse Command Line Options
Find Base Name of Input File
Create Output File and Open It
Read Input File into Memory
done = FALSE;
while (!done)
Get Next Parsed Line from Input File
Material Properties
Mesh
Solve
Flop Rate
Write Output Files
Stop
done = TRUE;
Finishing Program
Each line of this structure is a module. We will focus on the "Mesh"
and "Solve" modules. The structure of the "Mesh" module is the
following:
Form elements routine Set Matrix Type MESH *rdmesh(INPUT_CTX *input_ctx, DATAITEM **line, int num_fields, PROP *prop) Initializations Parse Input Data for Nodes and Elements Check for Valid Input Compute Spacing, Array Sizes and Allocate Partition Processors Local Position on this Processor Compute Nodes Coordinates Form Elements Store Boundary Conditions Store in [[MESH]] ContextThe [[MESH]] context contains the information which is passed to the solution manager module named "Solve". In the current version of the code, the [[MESH]] struct contains the following information (in the file "mesh.c")
NODES* nodes; /* A 1-D list of nodes on this processor */ VectorObj* frame_elements; /* The elements on the processor boundary */ VectorObj* pict_elements; /* the interior elements */ VectorObj* all_elements; /* all elements */ VectorObj* group_info; /* BC group information */ VectorObj* sndLists; /* The send lists to all neighbor procs */ int nodes_max; /* The number of nodes on this processor */ int numvar; /* number of variables */ int nStfSize; /* size of the matrix */ REAL* gaussp3d; /* 3D Gauss points coordinates xi,eta,zeta*/ REAL* wght3d; /* 3D weights of the Gauss points */ REAL* gaussp2d; /* 2D Gauss pnts coord */ REAL* wght2d; /* 2D Gauss pnts weights */ REAL* phi; /* Shape functions tri quads at nodes */ REAL* psi; /* Shape func 3 linear at corner nodes*/ REAL* dphi; /* Deriv xi,eta,zeta of shape func phi */ REAL* dpsi; /* Deriv of shape func psi */The module "Solve" refers to all the members of the [[MESH]] structure. It is therefore important that the correct interface is built if a new mesh generation routine is used. The structure of the solution manager "Solve" module is as follow:
void solnMgr(MESH *mesh, PROP *prop, PROBLEM *problem, INPUT_CTX
*input_ctx, DATAITEM **line, int num_fields, SOLVE** solvep)
local variables
Initialize
Parse command line
Decide How Many Unknowns
Initialize solution, Gauss, Shape, Deriv
Setup solver routines
for (;;)
Get Next Parsed Line
Reset all state variables to initial state
Parse User Input Line
done = FALSE;
while (!done)
Increment solution counters
Handle decoupled solution
Save old soln
Predict solution (or copy [[unold]] to [[unp]])
Form element stiffness matrix, right hand side and apply
boundary conditions's
Modify solver routines
A single solve
Update the solution with new solution if necessary
Check solution against exact solution
Iteration control
REAL eps; /* tolerance level for the residual norm */ Vector* soln; /* pointer to the solution */ int exact; /* flag if the exact solution is known */ int soln_init; /* initial solution if known */ int itermax; /* maximum number of iteration in the solver */ int iprint; /* iprint > 1 prints iterations */ PCG* pcg; /* pointer to the solver function: BCG */ PCG* pcg_t; /* pointer to thermal solver if decoupling*/ PCG* pcg_f; /* pointer to the Navier Stokes solver if needed*/
We list the main routines, by files.
The file "mgflo.c" contains the main routine.
In this file, the following routines are also included:
void rdtitle(DATAITEM **line, int num_fields): this routine reads the
title line of the input file.
void debug_requests(DATAITEM **line, int num_fields): this routine flags the
debugging if requested.
void flopRate(DATAITEM **line, int num_fields, PROBLEM* problem,
SOLVE* solve, REAL64 diffTime):
this routine prints out the number of flops per main routine, and overall
Gigaflops/s in the output file.
The file "mesh.c" contains the following routines:
void form_elements(VectorObj* elmv,int xelm, int yelm, int zelm,
int i0, int j0, int k0, int nnx, int nny, PROP*
prop, int visualizer):
provides the node coordinates of the elements for
the IJK grid, and connectivity
table on one
processor. The partitioning was previously done in
the main routine in the file "mgflo.c".
MESH *rdmesh(INPUT_CTX *input_ctx, DATAITEM **line, int num_fields,
PROP *prop): creates the send lists which are the picture and "frame" of
the grid on one processor. (The elements at the surface of the grid of
one processor, which share nodes with other processors or which have
nodes on the boundary of the physical domain are in the "frame". The
interior nodes are in the "picture".)
The file "solnMgr.c" contains the routines relevant for the solution manager:
int matvec(MVTYPE type, void* amat, void* p, void* w, REAL64* cnt):
sets the matrix vector BLAS2 routines: either normal or transpose.
int average(void* amat, void* p, void* maskv): performs some averaging
of the solution.
void zeroPresMask(MESH* mesh, int numvar, Vector* mask): sets the
pressure mask value to zero.
int sumRHS(MESH* mesh, Vector* f, REAL64* cnt): sums the right hand
side globally over all processors.
int formDiagInv(MESH* mesh, Vector* diagInv, REAL64* cnt): forms the
Jacobi preconditioning matrix.
void errorNorm(DATAITEM** user, int numvar, int nodesMax, NODES* nodes,
REAL* u, REAL* mask, REAL** unormp, REAL** diffsln):
computes the error norm when the exact solution is
known.
int stop(void* uservp, int iter, REAL rnorm, REAL fnorm, REAL zeta,
int exact, int iprint, void *soln, void *mask): sets the
stopping criteria.
int messageSetup(VectorObj* sndLists, int numvar): sets the messages.
Vector* formMask(VectorObj* sndLists, int nodesMax, int numvar, PCG*
pcg): forms the masking vector for the masked dot products.
void solnMgr(MESH *mesh, PROP *prop, PROBLEM *problem, INPUT_CTX
*input_ctx, DATAITEM **line, int num_fields, SOLVE**
solvep): is the general routine of the solution manager module.
The file "setbc.c" contains the routine which computes the boundary conditions:
void setbc(PROBLEM* problem, ELEMENTS* elm, NODES* nodes, VectorObj* group_info,
int nStfSize, int* gp2d,int* gp3d, REAL* wght2d, REAL* gaussp2d, REAL* phi,
PROBtype probtype, REAL* xyz, REAL* T_lag, REAL64* rcnt)
The file "pcg.c" contains the routines which set the parallel iterative solver.
int createPCG(PCG **pcg, void* matrix, PCGSolverType solverType):
creates the parallel solver pointer "pcg".
int setMatvecPCG(PCG *pcg, int (*matvec)(MVTYPE, void* ,void* ,void*
,REAL64 *)): sets the matrix vector function pointer which is
used by solver.
int setAveragePCG(PCG *pcg, int (*average)(void* ,void* ,void* )):
sets the average function pointer specified in the solution manager.
int setParmPCG(PCG* pcg, REAL eps, int iterMax, int iprint, int exact,
int solnInit): sets the parameters for the solver.
int setMaskPCG(PCG* pcg, void *mask): sets the proper masking
parameters for the solver.
int clearMaskPCG(PCG* pcg): clears the masking vector.
int setPrecondPCG(PCG *pcg, int (*pcroutine)(), void* precond): sets
the proper preconditioning routine for the solver chosen.
int setStoppingPCG(PCG* pcg,
int (*stop)(void*, int, REAL, REAL, REAL, int, int,
void*, void*),
void* user): sets the stopping criterion for the
solver.
int vecGetWork(PCG* pcg, int n): prepares the working vectors for the
solver.
int solvePCG(PCG *pcg, void* soln, void* f, int* its, REAL64* cnt):
specifies the type of solver (CG or BCG).
The routines for the conjugate gradient solver and biconjugate gradient solver are provided in the file "iter.c".
int cgSetup(PCG *pcg): sets the CG solver.
int cg(PCG* pcg, int* its, REAL64* cnt): CG solver routine.
int bcgSetup(PCG *pcg): sets the BCG solver.
int bcg(PCG* pcg, int* its, REAL64* cnt): BCG solver routine.
The file "serv.c" contains the server routines: message, error message, and warning message.
The file "vec.c" contains the routines which handle vector operations, and blas routines.
The file "user.c" contains routines which are specified by the user for the variable boundary conditions, the initial boundary conditions, and the exact solution. These routines are discussed in section 3.3 below.
The file "formkf.f" contains the following routines written in Fortran. These routines are:
subroutine surften(c,T_lag,iface,gp2d,gp3d, phi,wght2d,gaussp2d,
xyz,rhs): computes the boundary integral due to the
surface tension boundary condition.
subroutine fluxbc(c,iface,gp2d,gp3d,phi,wght2d,gaussp2d,xyz,rhs):
computes the thermal heat flux boundary integral when a heat
flux boundary condition is specified.
subroutine shapeq2q1(ngpts,gaussp3d, phi, psi,dphi,dpsi):
computes the Q2 (triquadratic) and Q1 (trilinear) shape functions
and their derivatives.
subroutine gauss(npoints,gaussp2d, wght2d,gaussp3d,wght3d):
computes the coordinates of the Gauss points in the master
element coordinate frame.
subroutine formkf_ns(discrtype,ngpts,gaussp3d,wght3d,phi,psi,
dphi,dpsi,xyz,u_lag,T_lag,prop,delt,solntyp,stf,rhs,rcount):
computes the element stiffness matrix and its corresponding right hand side
vector from the finite element discretized formulation of the
Navier-Stokes system of equations.
subroutine formkf_t(discrtype,ngpts,gaussp3d,wght3d,phi,psi, dphi,
dpsi,xyz,u_lag,T_lag,prop,delt,solntyp,stf,rhs,rcount):
computes the element stiffness matrix, and its corresponding
right hand side vector from the finite element discretized
formulation of the thermal equation.
The software requests an input file which uses a set of chosen
keywords. The software copies the input file
to all processors, which then parse the lines of the input file as
the computation progresses.
A sample input file is provided below
for the current version of the software and this is followed by a
detailed explanation of this input data.
title, " Test case for mgflo" material,rho=1.0,k=1.0,cp=1.,nu=1.,beta=0.,iprint=0 mesh,elem=48,48,48,min=0.0,0.0,0.0, max=1.0,1.0,1.0, partition=8,8,8 group=6,T=variable,iset=2 group=0,ux=0.0 group=0,uy=0.0 group=0,uz=0.0 group=1,ux=variable, iset=1 group=1,uy=0.0 group=1,uz=0.0 group=2,ux=variable, iset=1 group=2,uy=0.0 group=2,uz=0.0 face=0,group=6 face=0,group=0 face=1,group=6 face=1,group=1 face=2,group=6 face=2,group=2 face=3,group=6 face=3,group=1 face=4,group=6 face=4,group=2 face=5,group=6 face=5,group=0 end,mesh solve,decouple,iprint=1,user=dparabolic steady,nsteps=1,eps=1.0E-6,itermax=3000 end,solve flop,iprint=3 stop
The above input file is composed of
three main sections.
(1) The first section, line 3, gives the "material" data, values of the
properties of the fluid which are needed to perform the computation (in
cgs units here)
(density "rho", thermal conductivity "k",
constant pressure specific heat coefficient "cp", kinematic viscosity
"nu", thermal expansion coefficient "beta").
If no values is given the default value of 0 is assigned.
(2) The second section, (from line 4 to "end mesh")
provides information on the mesh, and the
partitioning among processors. The keyword "elem" indicates the number
of elements which are set in the "x", "y", and "z" direction of the
structured hexahedral grid; the keyword "min" gives the coordinates of the bottom
left front corner of the domain, and the keyword "max" gives the
coordinates of the upper back right corner of the domain;
the keyword "partition" provides the number of processors
assigned in each of the "x", "y", "z" directions.
The flow domain for this example has six faces, which are assigned
different "group" boundary conditions. A "face" may be assigned
several "groups" since several boundary conditions may apply to it.
(3) The third section of the input file provides data used for the solver. It
indicates which type of solver will be applied: "decouple" if the
Navier Stokes and the thermal equations are iteratively decoupled; "flow" if only
the Navier Stokes equations are solved, "thermal" if only the thermal
equation is solved. We also allow the user to provide a special set
of boundary conditions and initial conditions, when the keyword
"user" is set to a string. This string refers to sections of the file
"user.c", which are discussed below.
The keyword "steady" indicates a stationary flow calculation,
"transient" indicates that the time
dependent terms of the equations are included, with a
timestep specified by the keyword "delt".
The number of cycles or timesteps is set by "nsteps". The
precision of the biconjugate iterative solver is given by "eps", and
the maximum number of iterations in this iterative solver is provided
by "itermax".
We note that on several lines, the keyword "iprint" is used: this provides
various output information.
The order of the lines should be respected, yet on a given line, the
order of the keywords is not important.
For the example above, we have therefore the following
information:
the material density, thermal conductivity, viscosity, and heat
coefficient
are set to 1. The mesh is a uniform 48 by 48 by 48 structured grid,
for the unit cube. The number of processors requested is 512
in the form of an 8 by 8 by 8 processor partition of subdomains.
The boundary conditions on the unit cube indicate that the bottom
faces have a variable temperature (group 6), specified in the routine "user.c" as
the set of conditions "iset" = 2, of the section "dparabolic".
The bottom and top surfaces (face=0,5) satisfy the noslip
condition (provided by the group 0).
The lateral faces (face=1,2,3,4) have a variable "x" component of the
velocity, specified in the routine user.c, by the set "iset=1" in the
section "dparabolic", and have "y" and "z" velocity components set to zero.
Based on this data input, the program solves
the Navier Stokes and thermal equations in a
decoupled manner, with the specific boundary conditions provided by
the section "dparabolic". For the sample data set here,
no time dependency is implemented, and only one nonlinear
cycle is computed. The precision of the bi-conjugate gradient solver
is set to 1.e-06, and a maximum of 3000 iterations is requested.
Finally, the output file will print the overall Mflop rate for the code.
We remark that upon
completion of a flow simulation, the software provides
an output file, and, if chosen so,
data files to allow visualization of the mesh and the
solution.
The output file repeats a copy of the input file, and provides information
corresponding to the integer value assigned to the "iprint" keyword in
the input files.
If the statement "iprint=0" is given, or if the keyword "iprint" is
omitted, then no information is printed.
On the "material" input line, "iprint=1" reprints the material properties.
On the "mesh" input line, "iprint=1" prints the mesh information
(coordinates, connectivity table) in the output file.
On the "solve" input line, "iprint=1" gives the number of iterations
used in the iterative solver to converge to the tolerance "eps";
"iprint=2" prints the residual at each solver iteration.
On the "flop" input line, "iprint=1" prints the overall Gigaflops
rate, and wallclock time spent in the execution of the program;
"iprint=2" gives also the Gigaflops rates and wallclock times
of the solver, the formation of the elements stiffness matrix,
and boundary conditions routine; finally, setting "iprint=3"
shows the Gigaflops rate of the thermal equation solve, and of the
Navier Stokes equations solve. An example of the output data
for the input instruction
flop, iprint=3 is given in section 3.6.
3.5 User specified conditions.
The file "user.c" allows the user to specify a special set of initial
conditions, variable boundary conditions, or an exact solution, in
various routines. The conditions are called by a string, specified in
the input file. The following shows how the boundary conditions with a
variable x component of the velocity, corresponding to the string
"dparabolic" are coded in the routine "varibc" of the file "user.c":
void varibc(DATAITEM **user, REAL curtime, int iset, REAL x, REAL y, REAL z,
REAL *value)
...
else if (STRCASECMP(user[0]->cdata, ==, "dparabolic"))
{
*value = z*(1.-z);
}
...
Steps for compiling MGFLO:
The tar file mgflo.tar contains all necessary files to compile and run
the program. Once the tar file has been expanded, the steps to
compile it are:
a) cd Csupes
b) /bin/make
c) cd ..
d) /bin/make
The warning messages can safely be ignored.
Steps To Run MGFLO:
The tar file includes an input file `ns.dat". The qsub script file
mgflo.qsub can be submitted after changing line 3 to give the
appropriate directory that the tar file is in. The output will be
written to the file `ns.out".
The last 30 lines of the output file report the performance of the
code. Below is a copy of the end of the output file.
_____________________________________________________________________________ Time in solve = 651.354 sec Overall GigaFlop Rate: 16.543, GFLOP Cnt: 1.078e+04, Time: 652 1 Proc Solve MFLOPS/s: 32.385, solveCnt: 2.103e+10, solveTime: 649 1 Proc Fkf MFLOPS/s: 19.670, fkfCnt: 3.268e+07, fkfTime: 1.66 FLOW: 1 Proc MatVec MFLOPS/s: 33.293, mvCnt: 2.033e+10, mvTime: 611 1 Proc Precon MFLOPS/s: 13.195, pcCnt: 5.107e+07, pcTime: 3.87 1 Proc Dot MFLOPS/s: 11.598, dotCnt: 2.298e+08, dotTime: 19.8 1 Proc AXPY MFLOPS/s: 31.556, axpyCnt: 2.297e+08, axpyTime: 7.28 1 Proc AXPBY MFLOPS/s: 40.080, axpbyCnt: 7.653e+07, axpbyTime: 1.91 THERMAL: 1 Proc MatVec MFLOPS/s: 19.756, mvCnt: 1.015e+08, mvTime: 5.14 1 Proc Precon MFLOPS/s: 12.165, pcCnt: 6.613e+05, pcTime: 0.0544 1 Proc Dot MFLOPS/s: 9.847, dotCnt: 2.979e+06, dotTime: 0.303 1 Proc AXPY MFLOPS/s: 29.274, axpyCnt: 2.955e+06, axpyTime: 0.101 1 Proc AXPBY MFLOPS/s: 36.211, axpbyCnt: 9.821e+05, axpbyTime: 0.0271 Time in flop = 0.00450909 sec Time in stop = 1.99974E-05 sec Time in PROGRAM: 651.802173 _____________________________________________________________________________
4. Test Problems and Performance Results
Two test examples are provided below to
demonstrate the
performance results of the code on the NASA Goddard T3D, and T3E systems.
The first test is stationary solution of the Navier
Stokes equations and decoupled solution of the thermal equation.
The flow solution corresponds to a simple
Poiseuille flow calculation, and the temperature solution is simple
linear heat conduction. The exact solutions lie in the approximation
subspace and are obtained by MGFLO.
The input file is provided above in section 3.3. Essentially linear
speedup is obtained for configurations up to 512 nodes tested on the
T3D, with 96.7% scaling (see Figure 1).
The second test case is a Rayleigh- Benard- Marangoni type of
flow including the surface tension and buoyancy contribution.
The bottom
temperature is set to 62.64 C, the reference temperature is 60C. The
free surface is flat.
The thickness of the layer is 0.45 mm, the surface section of the fluid is 1
cm by 1 cm. The velocity at the walls is set to 0.
The side walls are insulated. The temperature field is specified on
the free surface as shown on Figure 2.
The code solves the Navier Stokes equations and the heat transfer equation.
The velocity vectors at the free flat surface are shown
in Figure 3.
The surface velocity vectors indicate that vortices are established
due to the variations of the surface temperature fields, and surface tension.
5. Experimental Studies
Laboratory experiments are now in progress to investigate several aspects
of surface-tension-driven convection. We have identified the primary
instability in high Prandtl number liquid bridge convection and have
used advanced, nonlinear control algorithms to prevent onset of instability.
Liquid bridge convection is being studied now in low Prandtl
number liquids, where the industrial interest is greatest,
and onset to instability has been observed.
Pattern formation within the ``dry spots'' of
long-wavelength Benard convection is being examined for the appearance
of hexagonal symmetries.
Finally, we are beginning liquid-liquid convection experiments, where
a rich variety of instabilities--long- and short-wavelength, stationary
and oscillatory--are predicted
for both heating from below and heating from above.
6. Conclusions and Recommendations
This research project was initiated in Summer 1996 under the NASA
ESS CAN program and software development began in Fall 1996. The
prototype parallel code MGFLO is now available
for applications and testing on microgravity flow problems. The
modular structure of the code permits expansion of its capabilities
and several enhancements are planned as the research program
continues. For example, we will continue to focus on code optimization
and performance milestones in the next year. At the same time, we will
develop a fully-coupled formulation and investigate techniques for
treating more general boundary conditions and geometry. Now that the
prototype code is complete, we plan to work more closely with the flow
experiment component of the program and focus on free surface
physics. The basic approach of parallelization via domain
decomposition, overlap of communications and computations, use of high
performance BLAS1 and BLAS2
kernels for the EBE computations will be carried
forward as the code capabilities are enhanced. Although the benchmark tests
described to date have focussed on a rectangular solid geometry, and
3D cartesian grid, the code is not limited in this manner. It is
designed to permit easy extension to more general geometries and
meshes, and incorporation of the more general strategies such as those
obtained using METIS and similar programs.
Acknowledgments This project was funded by NASA ESS CAN NCCS5 154.
thermocapillary: surface tension
effects associated with temperature variations on the fluid free
surface.
R-B-M: Rayleigh- Benard- Marangoni; thermally
driven flows due to both buoyancy and thermocapillary effects.
finite elements: approximation
technique involving weighted integral formulation of equations on
approximation subspaces.
BCG: Biconjugate gradient method, one of a class
of generalised gradient iterative solvers suited for nonsymmetric
algebraic systems.
MGFLO: Microgravity Flow Software Package.
EBE: Element-by-element formulation that circumvent system assembly.
BLAS: Basic Linear Algebra Subroutine, assembly coded to optimized performance.
References
[1] E.L. Koschmieder: Benard Cells and Taylor Vortices. Cambridge (1993).
[2] C. Rerez-Garcia et al. Linear stability analysis of Benard-Marangoni convection in fluids with a deformable free surface. Phys. Fluids A 3, pp292-298 (1991).
[3] S. J. Van Hook, M. F. Schatz, J. B. Swift, W. D. McCormick, H. L. Swinney. "Long-wavelength surface-tension-driven convection: Experiment and theory" to appear in Journal of Fluid Mechanics.
[4] E. B. Becker, G.F. Carey, J. T. Oden: Finite Elements Volumes 1,2,3 and 6, Prentice Hall International London, 1981.
[5] G. F. Carey, C. Harle, R. Mclay, S.Swift: "MPP Solution of Rayleigh - Benard - Marangoni Flows." To be presented at the Super Computing 97 conference (November 1997). postscript version
[6] G.F. Carey, A. Bose, B. Davis, C. Harle, R. McLay: "Some Aspects of Parallel Viscous Flow Computations." Presented at the Atlanta HPC conference (April 1997). postscript version
[7] B. Baret et al. Templates for the Solutions of linear Systems Building Blocks for Iterative Methods. SIAM, 1994, Philadelphia, Pa. [8] Fletcher, R. ``Conjugate Gradient Methods for Indefinite Systems,'' Lecture Notes in Mathematics, 506, Springer-Verlag, Berlin and New-York, 1976. [9] Barragy, E. and G. F. Carey. 1992. ``Parallel-Vector Computation with High-p Element- by-Element Methods.'' International J. of Computer Mathematics, 44, 329-339.
[10] Carey, G. F. (Ed). 1989. Parallel Supercomputing: Methods, Algorithms and Applications, John Wiley \& Sons, U.K.
[11] McLay, R., S. Swift, and G. F. Carey. 1996. ``Maximizing Sparse Matrix-Vector Product Performance on RISC based MIMD Computers.'' J. of Parallel and Distributed Computing, 37, 146-158.
[12] Carey G. F., `Rayleigh- Benard- Marangoni flows: theoretical formulation and solution algorithm.' postscript version
[13] McLay, R., G.F. Carey: "Coupled heat transfer and viscous flow and magnetic effects in weld pool analysis." IJNMF vol 9, pp713-730 (1989).