Authors
Dr G.F. Carey: project director, professor in Aerospace Engineering at
the University of Texas at Austin, director of the CFDLab.
Dr R. McLay: Research Associate at the University of
Texas at Austin.
Dr. Gurcan Bicken: Post Doctoral associate, CFDLab.
Dr. H Swinney: Professor at the University of Texas at Austin,
director of the Center for Nonlinear Dynamics.
Dr. S. Van Hook:
Post Doctoral associate, Center for Nonlinear Dynamics.
MGFLO is designed for parallel distributed computation of 3D
coupled viscous flow and heat transfer, including thermocapillary
surface tension effects. Of specific interest are microgravity fluid
problems of interest to NASA for life support fluid processing and
storage, and manufacturing applications in the Aerospace Shuttle and
Space Station Missions. The project involves development of software
for flow simulations on the Cray T3D and Cray T3E distributed MPP
systems, as well as fluid flow experiments on fluid films where the
Marangoni effect is dominant. The first generation MGFLO
code achieved a sustained 16.5 Gflops/s performance level on the 512
node T3D system at NASA Goddard in March 1997, for a representative 3D
coupled Rayleigh- Benard- Marangoni benchmark calculation.
An improved version of the code achieved 51.66 Gflops/s on the 512
T3E-600 system at NASA Goddard in July 1997;
the first generation code achieved 115 Gflops/s on a
1280 T3E-900 system at Cray Research Inc in July 1997.
A brief
description of the problem class, methodology, performance measurement
criteria, and software package is provided in this
document. References and publications from our team relevant to the research done
with this software are linked to this web page.
1. Rayleigh- Benard- Marangoni Flows
1.1 Microgravity Flow Problems
Benard in 1900 observed that cellular flow structures formed under certain conditions where a fluid layer was heated from below [1]. He attributed the behavior to the effect of thermal expansion and associated buoyancy forces in the fluid, and assumed that there were no significant effects associated with the free surface. Subsequently, Rayleigh in 1916, developed a mathematical model for this problem under the same assumptions and carried out a stability analysis. This class of problems is commonly referred to as the Rayleigh- Benard Problem.
Neither Rayleigh nor Benard realized that, in fact, thermocapillary surface tension at the free surface was the dominant mechanism driving the flow for this class of terrestrial experiments involving thin layers. This problem is termed the Rayleigh- Benard- Marangoni problem [2]. In fact, it is the dominance of surface tension over gravity that makes this problem so important to understanding the behavior of fluid in a microgravity environment [3]. Clearly, in NASA shuttle and space station orbit, gravity is negligible, and surface tension effects are paramount. However, it is generally not possible to carry out terrestrial experiments to test ideas before deployment in space (the one exception being the thin layer type of experiments conducted as part of the present work). Hence a simulation capability for Rayleigh- Benard- Marangoni problems is essential.
Since these problems are nonlinear 3D, coupled and require fine grid resolution for stable, reliable simulations, the problem is both computationally and memory intensive. Table 1 gives an estimate of solution times and memory requirements for a typical Rayleigh- Benard- Marangoni calculation using MGFLO on different Cray vector systems, and on the NASA Goddard MPP systems.
Computer |
J90 1 proc |
C90 1 proc |
T90 1 proc |
T3D 512 proc |
T3E 512 proc |
|---|---|---|---|---|---|
| MGFLO Flops/s | 73 M | 290 M | 600 M | 16.5 G | 51.6 G |
| Time: 9000 elements (Memory: 128 MW) |
9 days |
2.3 days |
1.1 day |
1 h |
20 mn |
| Size: 110582 elements (Memory: 1540 MW) |
Out of Memory |
Out of Memory |
Out of Memory |
In Memory |
In Memory |
Table: MGFLO time and memory comparison and estimates.
1.2 Mathematical Model and Finite Element Method
The model consists of the Navier-Stokes equations for flow in 3D, coupled with the energy transport equation for temperature. Under the Boussinesq approximation, buoyancy enters as a body force in the momentum equations, and a thermocapillary surface tension effect enters through a shear stress condition at the free surface [see 12(postscript hyperlinked), 13].
A weak variational saddle point formulation of the viscous flow and heat transfer problem is constructed, and a Galerkin finite element approximation statement follows with the free surface condition incorporated as a natural boundary condition [4]. The discretization is taken as tensor-product hexahedral elements with triquadratic basis for each velocity component, trilinear basis for the pressure, and triquadratic basis for temperature (see online papers linked to references [5,6]).
Two algorithms have been implemented: 1) the first approach consisted of
an iteratively decoupled treatment of the respective fluid flow and heat
transfer equations and successive approximation. At any given iterate the
computed velocity field from the previous iteration is used in the
convective heat transfer contributions and the subsequent temperature
solution is used to compute the buoyancy and free surface contribution to
the discretized momentum equations. Biconjugate gradient iteration is used
to solve each of the respective linear subsystems. The respective
element matrices are of size 27 27 for the temperature equation and 89 89
for the momentum equations. 2) The second approach corresponds to a
fully-coupled formulation for velocity components, pressure and
temperature. The nonlinear system at each step is solved using Newton
iteration with biconjugate gradient solution of the associated jacobian
systems. In this coupled formulation the resulting element matrix
contributions to the jacobian matrices are of size 116 116. This leads to
a slight improvement in matrix-vector product efficiency. Further details
are provided in
[7,8].
The problem is solved in parallel over a partition of subdomains using an
element-by-element strategy
for the BCG solver within each subdomain
[9]. That is,
the mesh of hexahedral elements is partitioned to subdomains containing
contiguous blocks of elements. Each such subdomain is assigned to a processor so
this defines the processor partitioning. The subdomain partition is
non-overlapping insofar as elements are concerned, but nodes, edges and faces on
the subdomain interfaces are shared by adjacent processors. Communication
between adjacent processors will be required for calculations involving those
elements with common nodes on an interface. By further grouping the processor
elements, it is possible to overlap computation and communication. For further
details, see the cited references [5
(hyper-linked)],
[10,11].
The computationally intensive parts are the element calculations, the matrix-vector products, and the dot products. The element calculations can be made in parallel and run on the T3E at 108 Mflops/s per processor. The EBE scheme circumvents global assembly and recasts the global matrix-vector products at the element level. This implies that we make local "almost dense" element matrix-vector products, for the elements interior to any given processor subdomain. Load balance is achieved if the number of elements per processor subdomain is approximately constant.
Clearly, the BLAS2 matrix-vector routine is heavily used (requiring more than 80% of the CPU time for a typical calculation) [11]. In our optimized code, this runs at 108 Mflops/s for a single processor on the T3E. Near linear speedup and scalability on 512 nodes of the T3E at 93.4% were achieved in this manner for the test problem described later.
2. Evaluation Criteria and Performance
The norm for evaluating the speed of the code was specified under the NASA contract in terms of Gigaflops/s with a minimum sustained speed of 50 Gigaflops/s to be attained on the representative 3D coupled problem as the performance milestone for February 1998. This goal was met in July 1997 where the performance was assessed to exceed 51.6 Gigaflops/s on 512 nodes of the NASA Goddard Cray T3E, and in excess of 115 Gflops/s on a 1280 nodes T3E at Cray Research Inc in July of 1997.
3. MGFLO Software Package
3.1 Code Modules
The Code is written in C and Fortran. Communications between
processors is achieved using MPI and the Cray shared memory library
(shmem).
(Shmem is used in situations where communication speed critically
impacts performance).
The software consists of the following modules:
(1) A simple 3D mesh generator (IJK type) for hexahedral elements
and a similar processor partitioning strategy. That is, the
connectivity table is also provided by the module. (Note that the
implementation is not limited to cartesian meshes and is designed to
handle more general configurations.
(2) A solution manager module which handles the options specified
in the input file needed to compute the element matrices and vectors
for the discretized problem. At each timestep, or nonlinear iteration
cycle, the solution manager calls the routine to form the element
matrices and vectors and applies the essential boundary
conditions. The solver module is then called.
(3) The solver module consists of element-by-element matrix vector
products with overlapped communications and associated dot products,
also recast at the element level. A special masked componentwise
vector product is introduced in this latter calculation.
The interface between the mesh generator and the other two
modules is a "struct" in C which specifies the nodes coordinates,
boundary conditions and connectivity tables.
The structure of the main program then is:
Standard Headers
Read Title Routine
Read in materials
Flop Rate Routine
main(int argc, char **argv)
Main Definitions
Initializations
Parse Command Line Options
Find Base Name of Input File
Create Output File and Open It
Read Input File into Memory
done = FALSE;
while (!done)
Get Next Parsed Line from Input File
Material Properties
Mesh
Solve
Flop Rate
Write Output Files
Stop
done = TRUE;
Finishing Program
Each line of this structure is a module. We will focus on the "Mesh"
and "Solve" modules. The structure of the "Mesh" module is the
following:
Form elements routine Set Matrix Type MESH *rdmesh(INPUT_CTX *input_ctx, DATAITEM **line, int num_fields, PROP *prop) Initializations Parse Input Data for Nodes and Elements Check for Valid Input Compute Spacing, Array Sizes and Allocate Partition Processors Local Position on this Processor Compute Nodes Coordinates Form Elements Store Boundary Conditions Store in [[MESH]] ContextThe [[MESH]] context contains the information which is passed to the solution manager module named "Solve". In the current version of the code, the [[MESH]] struct contains the following information (in the file "mesh.c"):
NODES* nodes; /* A 1-D list of nodes on this processor */
VectorObj* frame_elements; /* The elements on the processor boundary */
VectorObj* pict_elements; /* the interior elements */
VectorObj* all_elements; /* all elements */
VectorObj* group_info; /* BC group information */
VectorObj* sndLists; /* The send lists to all neighbor procs */
int elm_max; /* The number of elements on this processor */
int elm_max_g; /* The maximum global number of elements */
int maxElmPP; /* The maximum nr. of elements per processor */
int constNPP; /* if true then the number nodes per
processor are the same. */
int constEPP;
int* elmSize;
int nodes_max; /* The number of nodes on this processor */
int nodes_max_g; /* The maximum global number of nodes */
int maxNodesPP; /* The maximum nodes per processor */
int* nodesSize; /* Array of nodes on each processor */
int baseNum; /* base number associating the local
proc nodes numbering to
global node numbering */
int numvar; /* number of variables */
int nStfSize; /* size of the matrix */
int gp1d;
int gp2d;
int gp3d;
REAL* gauss3d; /* 3D Gauss points coordinates xi,eta,zeta*/
REAL* wght3d; /* 3D weights of the Gauss points */
REAL* gauss2d; /* 2D Gauss pnts coord */
REAL* wght2d; /* 2D Gauss pnts weights */
REAL* phi; /* Shape functions tri quads at nodes */
REAL* psi; /* Shape func 3 linear at corner nodes*/
REAL* dphi; /* Deriv xi,eta,zeta of shape func phi */
REAL* dpsi; /* Deriv of shape func psi */
REAL* en; /* 2D shape function */
REAL* dn; /* 2D Shape function deriv */
The module "Solve" refers to all the members of the [[MESH]]
structure. It is therefore important that the correct interface is
built if a new mesh generation routine is used.
The structure of the solution manager "Solve" module is as follows:
void solnMgr(MESH *mesh, PROP *prop, PROBLEM *problem, INPUT_CTX
*input_ctx, DATAITEM **line, int num_fields, SOLVE** solvep)
Local Variables
Initialize
Parse command line
Decide How Many Unknowns
Initialize solution, Gauss, Shape, Deriv
Setup solver routines
for (;;) /* Loop over User Requests (Until "END" is found)*/
{
Get Next Parsed Line, [[break]] if [["END"]] is found
Reset all state variables to initial state
Parse User Input Line
/* Loop over number of solution request from current user line.*/
done = FALSE;
while (!done)
{
Increment Solution counters
Save old soln
Predict solution (or copy old to predicted)
/* Normally one-pass loop except for decoupled case */
while (!whole)
{
Handle decoupled solution
Form element [[stf]], [[rhs]] and apply BCs
Modify solver routines
A Single Solve
Update [[U]] with new solution if necessary
Toggle between [[FLOW]] and [[THERMAL]] when necessary
}
Check Solution against Exact Solution
Acceleration, Iteration and Time Step Control
}
}
Deallocations
REAL time; /* time of the solution for transient cases */ Vector* soln; /* pointer containing the solution */ int kstep; /* the solution count */ int numvar; /* number of unknowns */ char* varPool; /* pointer containing names of variables */ int sizePool; /* size of the varPool pointer */ PCG* pcg; /* pointer to the solver function: BCG */ PCG* pcg_t; /* pointer to thermal solver if decoupling*/ PCG* pcg_f; /* pointer to the Navier Stokes solver if needed*/
We list the main routines, by files.
The file "mgflo.c" contains the main routine.
In this file, the following routines are also included:
void rdtitle(DATAITEM **line, int num_fields): this routine reads the
title line of the input file.
void debug_requests(DATAITEM **line, int num_fields): this routine flags the
debugging if requested.
void flopRate(DATAITEM **line, int num_fields, PROBLEM* problem,
SOLVE* solve, REAL64 diffTime):
this routine prints out the number of flops per main routine, and overall
Gigaflops/s in the output file.
The file "mesh.c" contains the following routines:
void form_elements(VectorObj* elmv,int xelm, int yelm, int zelm,
int i0, int j0, int k0, int nnx, int nny, PROP*
prop, int visualizer):
provides the node coordinates of the elements for
the IJK grid, and connectivity
table on one
processor. The partitioning was previously done in
the main routine in the file "mgflo.c".
MESH *rdmesh(INPUT_CTX *input_ctx, DATAITEM **line, int num_fields,
PROP *prop): creates the send lists which are the picture and "frame" of
the grid on one processor. (The elements at the surface of the grid of
one processor, which share nodes with other processors or which have
nodes on the boundary of the physical domain are in the "frame". The
interior nodes are in the "picture".)
The file "solnMgr.c" contains the routines relevant for the solution manager:
int matvec(MVTYPE type, void* amat, void* p, void* w, REAL64* cnt):
sets the matrix vector BLAS2 routines: either normal or transpose.
void zeroPresMask(MESH* mesh, int numvar, Vector* mask): sets the
pressure mask value to zero.
int sumRHS(MESH* mesh, Vector* f, REAL64* cnt): sums the right hand
side globally over all processors.
int formDiagInv(MESH* mesh, Vector* diagInv, REAL64* cnt): forms the
Jacobi preconditioning matrix.
void errorNorm(DATAITEM** user, int numvar, int nodesMax, NODES* nodes,
REAL* u, REAL* mask, REAL** unormp, REAL** diffsln):
computes the error norm when the exact solution is
known.
int stop(void* uservp, int iter, REAL rnorm, REAL fnorm, REAL zeta,
int exact, int iprint, void *soln, void *mask): sets the
stopping criteria.
int messageSetup(VectorObj* sndLists, int numvar): sets the messages.
Vector* formMask(VectorObj* sndLists, int nodesMax, int numvar, PCG*
pcg): forms the masking vector for the masked dot products.
void solnMgr(MESH *mesh, PROP *prop, PROBLEM *problem, INPUT_CTX
*input_ctx, DATAITEM **line, int num_fields, SOLVE**
solvep): is the general routine of the solution manager module.
The file "setbc.c" contains the routine which computes the boundary conditions:
void setbc(PROBLEM* problem, ELEMENTS* elm, NODES* nodes, VectorObj* group_info,
int nStfSize, int* gp2d,int* gp3d, REAL* wght2d, REAL* gaussp2d, REAL* phi,
PROBtype probtype, REAL* xyz, REAL* T_lag, REAL64* rcnt)
The file "pcg.c" contains the routines which set the parallel iterative solver.
int createPCG(PCG **pcg, void* matrix, PCGSolverType solverType):
creates the parallel solver pointer "pcg".
int setMatvecPCG(PCG *pcg, int (*matvec)(MVTYPE, void* ,void* ,void*
,REAL64 *)): sets the matrix vector function pointer which is
used by solver.
int setAveragePCG(PCG *pcg, int (*average)(void* ,void* ,void* )):
sets the average function pointer specified in the solution manager.
int setParmPCG(PCG* pcg, REAL eps, int iterMax, int iprint, int exact,
int solnInit): sets the parameters for the solver.
int setMaskPCG(PCG* pcg, void *mask): sets the proper masking
parameters for the solver.
int clearMaskPCG(PCG* pcg): clears the masking vector.
int setPrecondPCG(PCG *pcg, int (*pcroutine)(), void* precond): sets
the proper preconditioning routine for the solver chosen.
int setStoppingPCG(PCG* pcg,
int (*stop)(void*, int, REAL, REAL, REAL, int, int,
void*, void*),
void* user): sets the stopping criterion for the
solver.
int vecGetWork(PCG* pcg, int n): prepares the working vectors for the
solver.
int solvePCG(PCG *pcg, void* soln, void* f, int* its, REAL64* cnt):
specifies the type of solver (CG or BCG).
The routines for the conjugate gradient solver and biconjugate gradient solver are provided in the file "iter.c".
int cgSetup(PCG *pcg): sets the CG solver.
int cg(PCG* pcg, int* its, REAL64* cnt): CG solver routine.
int bcgSetup(PCG *pcg): sets the BCG solver.
int bcg(PCG* pcg, int* its, REAL64* cnt): BCG solver routine.
The file "serv.c" contains the server routines: message, error message, and warning message.
The file "vec.c" contains the routines which handle vector operations, and blas routines.
The file "user.c" contains routines which are specified by the user for the variable boundary conditions, the initial boundary conditions, and the exact solution. These routines are discussed in section 3.3 below.
The file "formkf.f" contains the following routines written in Fortran. These routines are:
subroutine surften(c,T_lag,iface,gp2d,gp3d, phi,wght2d,gaussp2d,
xyz,rhs): computes the boundary integral due to the
surface tension boundary condition.
subroutine fluxbc(c,iface,gp2d,gp3d,phi,wght2d,gaussp2d,xyz,rhs):
computes the thermal heat flux boundary integral when a heat
flux boundary condition is specified.
subroutine shapeq2q1(ngpts,gaussp3d, phi, psi,dphi,dpsi):
computes the Q2 (triquadratic) and Q1 (trilinear) shape functions
and their derivatives.
subroutine gauss(npoints,gaussp2d, wght2d,gaussp3d,wght3d):
computes the coordinates of the Gauss points in the master
element coordinate frame.
subroutine formkf_ns(discrtype,ngpts,gaussp3d,wght3d,phi,psi,
dphi,dpsi,xyz,u_lag,T_lag,prop,delt,solntyp,stf,rhs,rcount):
computes the element stiffness matrix and its corresponding right hand side
vector from the finite element discretized formulation of the
Navier-Stokes system of equations.
subroutine formkf_t(discrtype,ngpts,gaussp3d,wght3d,phi,psi, dphi,
dpsi,xyz,u_lag,T_lag,prop,delt,solntyp,stf,rhs,rcount):
computes the element stiffness matrix, and its corresponding
right hand side vector from the finite element discretized
formulation of the thermal equation.
The software requests an input file which uses a set of chosen
keywords. The software copies the input file
to all processors, which then parse the lines of the input file as
the computation progresses.
A sample input file is provided below
for the current version of the software and this is followed by a
detailed explanation of this input data.
! This is how a comment is written title, "Test Case for Mgflo" material,rho=1.0,k=1.0,mu=1.0,cp=0.0,iprint=0 mesh,elem=64,64,64,min=0.0,0.0,0.0, max=1.0,1.0,1.0, partition=8,8,8,iprint=0 group=6,T=variable,iset=2 group=0,ux=0.0 group=0,uy=0.0 group=0,uz=0.0 group=1,ux=variable, iset=1 group=1,uy=0.0 group=1,uz=0.0 group=2,ux=variable, iset=1 group=2,uy=0.0 group=2,uz=0.0 group=7,p=0.0 node=0,0,group=7 face=0,group=6 face=0,group=0 face=1,group=6 face=1,group=1 face=2,group=6 face=2,group=2 face=3,group=6 face=3,group=1 face=4,group=6 face=4,group=2 face=5,group=6 face=5,group=0 end,mesh solve,coupled, iprint=0,user=dparabolic steady,nsteps=1,eps=1.E-10,itermax=3000 end,solve flop,iprint=3 stop
The above input file is composed of
three main sections.
(1) The first section, line 3, gives the "material" data, values of the
properties of the fluid which are needed to perform the computation (in
cgs units here)
(density "rho", thermal conductivity "k",
constant pressure specific heat coefficient "cp", dynamic viscosity
"mu", thermal expansion coefficient "beta").
If no values is given the default value of 0 is assigned.
(2) The second section, (from line 4 to "end mesh")
provides information on the mesh, and the
partitioning among processors. The keyword "elem" indicates the number
of elements which are set in the "x", "y", and "z" direction of the
structured hexahedral grid; the keyword "min" gives the coordinates of the bottom
left front corner of the domain, and the keyword "max" gives the
coordinates of the upper back right corner of the domain;
the keyword "partition" provides the number of processors
assigned in each of the "x", "y", "z" directions.
The flow domain for this example has six faces, which are assigned
different "group" boundary conditions. A "face" may be assigned
several "groups" since several boundary conditions may apply to it.
Surface tension is given as surface boundary condition.
(3) The third section of the input file provides data used for the solver. It
indicates which type of solver will be applied: "decouple" if the
Navier Stokes and the thermal equations are iteratively decoupled;
"coupled" if the Navier Stokes and thermal equations are solved in a
coupled manner;
"flow" if only
the Navier Stokes equations are solved, "thermal" if only the thermal
equation is solved. We also allow the user to provide a special set
of boundary conditions and initial conditions, when the keyword
"user" is set to a string. This string refers to sections of the file
"user.c", which are discussed below.
The keyword "steady" indicates a stationary flow calculation,
"transient" indicates that the time
dependent terms of the equations are included, with a
timestep specified by the keyword "delt".
The number of cycles or timesteps is set by "nstep". The
precision of the biconjugate iterative solver is given by "eps", and
the maximum number of iterations in this iterative solver is provided
by "itermax".
We note that on several lines, the keyword "iprint" is used: this provides
various output information.
In general, the order of the lines should be respected, yet on a given line, the
order of the keywords is not important after the first word. The only
exception to this rule is the keyword "user" on the solve line, it
followed by an arbitrary number of parameters must be the last one on
the solve line.
For the example above, we have therefore the following
information:
the material density, thermal conductivity, viscosity, and heat
coefficient
are set to 1. The mesh is a uniform 64 by 64 by 64 structured grid,
for the unit cube. The number of processors requested is 512
in the form of an 8 by 8 by 8 processor partition of subdomains.
The boundary conditions on the unit cube indicate that the bottom
faces have a variable temperature (group 6), specified in the routine "user.c" as
the set of conditions "iset" = 2, of the section "dparabolic".
The bottom and top surfaces (face=0,5) satisfy the noslip
condition (provided by the group 0).
The lateral faces (face=1,2,3,4) have a variable "x" component of the
velocity, specified in the routine user.c, by the set "iset=1" in the
section "dparabolic", and have "y" and "z" velocity components set to zero.
Based on this data input, the program solves
the Navier Stokes and thermal equations in a
coupled manner, with the specific boundary conditions provided by
the section "dparabolic". For the sample data set here,
no time dependency is implemented, and only one nonlinear
cycle is computed. The precision of the bi-conjugate gradient solver
is set to 1.e-10, and a maximum of 3000 iterations is specified.
Finally, the output file will print the overall Mflop rate for the code.
We remark that upon
completion of a flow simulation, the software provides
an output file, and, if chosen so,
data files to allow visualization of the mesh and the
solution.
The output file repeats a copy of the input file, and provides information
corresponding to the integer value assigned to the "iprint" keyword in
the input files.
If the statement "iprint=0" is given, or if the keyword "iprint" is
omitted, then no information is printed.
On the "material" input line, "iprint=1" reprints the material properties.
On the "mesh" input line, "iprint=1" prints the mesh information
(coordinates, connectivity table) in the output file.
On the "solve" input line, "iprint=1" gives the number of iterations
used in the iterative solver to converge to the tolerance "eps";
"iprint=2" prints the residual at each solver iteration.
On the "flop" input line, "iprint=1" prints the overall Gigaflop
rate, and wallclock time spent in the execution of the program;
"iprint=2" gives also the Gigaflop rates and wallclock times
of the solver, the formation of the element stiffness matrices,
and boundary conditions routine; finally, setting "iprint=3"
shows the Gigaflop rate of the thermal equation solve, and of the
Navier Stokes equations solve. An example of the output data
for the input instruction
flop, iprint=3 is given in section 3.6.
3.5 User specified conditions.
The file "user.c" allows the user to specify a special set of initial
conditions, variable boundary conditions, or an exact solution, in
various routines. The conditions are called by a string, specified in
the input file. The following shows how the boundary conditions with a
variable x component of the velocity, corresponding to the string
"dparabolic" are coded in the routine "varibc" of the file "user.c":
void varibc(DATAITEM **user, REAL curtime, int iset, REAL x, REAL y, REAL z,
REAL *value)
...
else if (STRCASECMP(user[0]->cdata, ==, "dparabolic"))
{
*value = z*(1.-z);
}
...
Steps for compiling MGFLO:
The tar file mgflo.tar contains all necessary files to compile and run
the program. Once the tar file has been expanded, the steps to
compile it are:
a) cd Csupes
b) /bin/make
c) cd ..
d) /bin/make
The warning messages can safely be ignored.
Steps To Run MGFLO:
The tar file includes an input file "ns.dat". The qsub script file
mgflo.qsub can be submitted after changing line 3 to give the
appropriate directory that the tar file is in. The output will be
written to the file "ns.out".
The last 15 lines of the output file report the performance of the
code. Below is a copy of the end of the output file.
Sample of output for a run on the 512 nodes T3E. _____________________________________________________________________________ Time in solve = 858.947 sec Overall GigaFlop Rate: 51.279, GFLOP Cnt: 4.406e+04, Time: 859 1 Proc Solve MFLOPS/s: 103.688, solveCnt: 8.579e+10, solveTime: 827 1 Proc Fkf MFLOPS/s: 81.646, fkfCnt: 2.612e+08, fkfTime: 3.2 1 Proc MatVec MFLOPS/s: 108.020, mvCnt: 8.410e+10, mvTime: 779 1 Proc Precon MFLOPS/s: 18.095, pcCnt: 1.474e+08, pcTime: 8.15 1 Proc Dot MFLOPS/s: 41.345, dotCnt: 6.635e+08, dotTime: 16 1 Proc AXPY MFLOPS/s: 34.848, axpyCnt: 6.633e+08, axpyTime: 19 1 Proc AXPBY MFLOPS/s: 43.245, axpbyCnt: 2.211e+08, axpbyTime: 5.11 Time in flop = 0.000596881 sec Time in stop = 6.64592E-06 sec Time in PROGRAM: 859.237233 _____________________________________________________________________________
The first test is stationary solution of the Navier Stokes equations and coupled solution of the thermal equation. The flow solution corresponds to a simple Poiseuille flow calculation, and the temperature solution is simple linear heat conduction. The exact solutions lie in the approximation subspace and are obtained by MGFLO using the coupled algorthm. The input file is provided above in section 3.3. Essentially linear speedup is obtained for configurations up to 512 nodes tested on the T3E, with 96.7% scaling (see Figure 1).

The second test case is a stationary Rayleigh-Benard-Marangoni type of flow including the surface tension and buoyancy contribution. The bottom temperature is set to 62.64 C, the reference temperature is 60C. The free surface is flat. The thickness of the layer is 0.45 mm, the surface section of the fluid is 1 cm by 1 cm. The velocity at the walls is set to 0. The side walls are insulated. The temperature field is specified on the free surface as shown on Figure 2. The code solves the Navier Stokes equations and the heat transfer equation using the coupled algorithm. The velocity vectors at the free flat surface are shown in Figure 3. The surface velocity vectors indicate that vortices are established due to the variations of the surface temperature fields, and surface tension.


The third test problem is transient natural convection (NC) in a cube with heated side walls Two cases are considered: a) free surface with no thermocapillary surface tension, and b) with thermocapillary surface tension. The sidewall temperatures are normalized to 1.0 in front and zero on the back surface . The effect of suface tension is to augment the convective cell formation. The nature of the flow and thermal fields is indicated in Figure 4.

Figure 4: NC with surface tension.
5. Experimental Studies
Laboratory experiments are being conducted to investigate several aspects
of surface-tension-driven convection. We have identified the primary instability
in high Prandtl number liquid bridge convection and have used control techniques
to prevent onset of instability. Liquid bridge convection is being studied
now in low Prandtl number liquids, where the industrial interest is greatest,
and onset to instability has been observed. Pattern formation within the
``dry spots'' of long-wavelength Benard convection is being examined for
the appearance of hexagonal symmetries. We are also investigating liquid-liquid
convection experiments, where a rich variety of instabilities--long- and
short-wavelength, stationary and oscillatory--are predicted for both heating
from below and heating from above.
6. Conclusions and Recommendations
This research project was initiated in Summer 1996 under the NASA ESS
CAN program and software development began in Fall 1996. The initial prototype
parallel code was based on a decoupled formulation and completed in July
1997. A new version that includes both decoupled and coupled algorithms
is now available for applications and testing on microgravity flow problems.
The modular structure of the code permits expansion of its capabilities
and several enhancements are planned as the research program continues.
At this stage, we have focused on code optimization and performance milestones.
The thermocapillary effect has been inluded in both formulations.
Now that the prototype code is complete, we plan to work more closely with
the flow experiment component of the program and focus on free surface
physics. The basic approach of parallelization via domain decomposition,
overlap of communications and computations, use of high performance BLAS1
and BLAS2 kernels for the EBE computations
will be carried forward as the code capabilities are enhanced. Although
the benchmark tests described to date have focussed on a rectangular solid
geometry, and 3D cartesian grid, the code is not limited in this manner.
It is designed to permit easy extension to more general geometries and
meshes, and incorporation of more general partitioning strategies such
as those obtained using METIS and similar programs.
Acknowledgments This project was funded by NASA ESS CAN NCCS5 154.
Glossary
BCG: Biconjugate gradient method, one of a class
of generalised gradient iterative solvers suited for nonsymmetric algebraic
systems.
BLAS: Basic Linear Algebra Subroutine, assembly coded to optimized performance.
EBE: Element-by-element formulation that circumvent system assembly.
finite element method: approximation technique
involving weighted integral formulation of equations on approximation subspaces
defined using discretization of domain to "elements" such as hexahedra
or tetrahedra.
MGFLO: Microgravity Flow Software Package.
R-B-M: Rayleigh- Benard- Marangoni; thermally driven flows due to both buoyancy and thermocapillary effects.
thermocapillary: surface tension effects
associated with temperature variations on the fluid free surface.
References
[2] C. Rerez-Garcia, et al, "Linear stability analysis of Benard-Marangoni convection in fluids with a deformable free surface." Phys. Fluids A 3, 1991, 292-298
[3] S. J. Van Hook, M. F. Schatz, J. B. Swift, W. D. McCormick, H. L. Swinney, "Long-wavelength surface-tension-driven convection: Experiment and theory." To appear in Journal of Fluid Mechanics.
[4] E. B. Becker, G.F. Carey, J. T. Oden, Finite Elements. 6 vols. London: Prentice Hall International, 1981. Vols. 1, 2, 3, and 6
[5] G. F. Carey, C. Harle, R. McLay, S.Swift, "MPP Solution of Rayleigh - Benard - Marangoni Flows." To be presented at the Super Computing '97 conference (November 1997). postscript version
[6] G.F. Carey, A. Bose, B. Davis, C. Harle, R. McLay, "Some Aspects of Parallel Viscous Flow Computations." Presented at the Atlanta HPC conference (April 1997). postscript version
[7] B. Baret, et al., "Templates for the Solutions of linear Systems Building Blocks for Iterative Methods." SIAM, 1994, Philadelphia, Pa.
[8] Fletcher, R., "Conjugate Gradient Methods for Indefinite Systems." Lecture Notes in Mathematics, Berlin and New York: Springer-Verlag, 1976, 506
[9] E. Barragy and G. F. Carey, "Parallel-Vector Computation with High-p Element- by-Element Methods.'' International Journal of Computer Mathematics 44, 1992, 329-339
[10] G.F.Carey, ed., Parallel Supercomputing: Methods, Algorithms and Applications. U.K.: John Wiley & Sons, 1989
[11] R. McLay, S. Swift, and G. F. Carey, "Maximizing Sparse Matrix-Vector Product Performance on RISC based MIMD Computers.'' Journal of Parallel and Distributed Computing 37, 1996, 146-158
[12] G.F. Carey, "Rayleigh- Benard- Marangoni flows: theoretical formulation and solution algorithm." postscript version
[13] R. McLay, G.F. Carey, "Coupled heat transfer and viscous flow and magnetic effects in weld pool analysis." IJNMF 9, 1989, 713-730