Scalable Parallel Finite Element Computations of
Rayleigh-Benard-Marangoni Problems
in a Microgravity Environment

Project Director: Dr. G. F. Carey, CFDLab,
The University of Texas at Austin.

Contents:
  1. Rayleigh-Benard-Marangoni Flows
    1. Microgravity Flow Problems
    2. Mathematical Model and Finite element Method
    3. Solution Algorithm
    4. Parallelization
    5. Performance
  2. Evaluation Criteria and Performance
  3. MGFLO Software Package
    1. Code Modules
    2. List of Routines
    3. Sample Input File
    4. Sample Output File
    5. User specified conditions
    6. Usage
  4. Test Problem and Performance Results
  5. Experimental Studies
  6. Conclusions and Recommendations


Authors
Dr G.F. Carey: project director, professor in Aerospace Engineering at the University of Texas at Austin, director of the CFDLab.
Dr R. Mclay: Research Associate at the University of Texas at Austin.
Dr. C. Harlé: Post Doctoral associate, CFDLab.
Dr. B. Davis: Post Doctoral associate, CFDLab.
Dr. H Swinney: Professor at the University of Texas at Austin, director of the Center for Nonlinear Dynamics.
Dr. S. Van Hook: Post Doctoral associate, Center for Nonlinear Dynamics.

Abstract


MGFLO is designed for parallel distributed computation of 3D coupled viscous flow and heat transfer, including thermocapillary surface tension effects. Of specific interest are microgravity fluid problems of interest to NASA for life support fluid processing and storage, and manufacturing applications in the Aerospace Shuttle and Space Station Missions. The project involves development of software for flow simulations on the Cray T3D and Cray T3E distributed MPP systems, as well as fluid flow experiments on fluid films where the Marangoni effect is dominant. The current (first generation) MGFLO code achieved a sustained 16.5 Gflops/s performance level on the 512 node T3D system at NASA Goddard in March 1997, for a representative 3D coupled Rayleigh- Benard- Marangoni benchmark calculation. A brief description of the problem class, methodology, performance measurement criteria, and software package is provided in this document.

1. Rayleigh- Benard- Marangoni Flows
1.1 Microgravity Flow Problems

Benard in 1900 observed that cellular flow structures formed under certain conditions where a fluid layer was heated from below [1]. He attributed the behavior to the effect of thermal expansion and associated buoyancy forces in the fluid, and assumed that there were no significant effects associated with the free surface. Subsequently, Rayleigh in 1916, developed a mathematical model for this problem under the same assumptions and carried out a stability analysis. This class of problems is commonly referred to as the Rayleigh- Benard Problem.

Neither Rayleigh nor Benard realized that, in fact, thermocapillary surface tension at the free surface was the dominant mechanism driving the flow for this class of terrestrial experiments involving thin layers. This problem is termed the Rayleigh- Benard- Marangoni problem [2]. In fact, it is the dominance of surface tension over gravity that makes this problem so important to understanding the behavior of fluid in a microgravity environment [3]. Clearly, in NASA shuttle and space station orbit, gravity is negligible, and surface tension effects are paramount. However, it is generally not possible to carry out terrestrial experiments to test ideas before deployment in space (the one exception being the thin layer type of experiments conducted as part of the present work). Hence a simulation capability for Rayleigh- Benard- Marangoni problems is essential.

Since these problems are nonlinear 3D, coupled and require fine grid resolution for stable, reliable simulations, the problem is both computationally and memory intensive. Table 1 gives an estimate of solution times and memory requirements for a typical Rayleigh- Benard- Marangoni calculation using MGFLO on different Cray vector systems, and on the NASA Goddard MPP systems.


Computer
J90
1 proc
C90
1 proc
T90
1 proc
T3D
512 proc
T3E
512 proc
MGFLO Flops/s 73 M 290 M 600 M 16.5 G 53.0 G
Time: 9000 elements
(Memory: 128 MW)
9 days
2.3 days
1.1 day
1 h
20 mn
Time: 110582 elements
(Memory: 1540 MW)
Out of
Memory
Out of
Memory
Out of
Memory
In
Memory
In
Memory


MGFLO time and memory comparison


1.2 Mathematical Model and Finite Element Method

The model consists of the Navier-Stokes equations for flow in 3D, coupled with the energy transport equation for temperature. Under the Boussinesq approximation, buoyancy enters as a body force in the momentum equations, and a thermocapillary surface tension effect enters through a shear stress condition at the free surface [see 12(postscript hyperlinked), 13].

A weak variational saddle point formulation of the viscous flow and heat transfer problem is constructed, and a Galerkin finite element approximation statement follows with the free surface condition incorporated as a natural boundary condition [4]. The discretization is taken as tensor-product hexahedral elements with triquadratic basis for each velocity component, trilinear basis for the pressure, and triquadratic basis for temperature (see online papers linked to references [5,6]).


1.3 Solution Algorithm


The resulting semidiscrete system is integrated implicitly using a Crank-Nicholson (midstep) rule, and the resulting nonlinear algebraic system for each timestep is solved using block iteration with biconjugate gradient (BCG) iterative solution of each resulting linear algebraic system. Convergence of the inner linear BCG solve is based of the usual residual-type criterion [7,8].


1.4 Parallelization


The problem is solved in parallel over a partition of subdomains using an element-by-element strategy for the BCG solver within each subdomain [9]. That is, the mesh of hexahedral elements is partitioned to subdomains containing contiguous blocks of elements. Each such subdomain is assigned to a processor so this defines the processor partitioning. The subdomain partition is non-overlapping insofar as elements are concerned, but nodes, edges and faces on the subdomain interfaces are shared by adjacent processors. Communication between adjacent processors will be required for calculations involving those elements with common nodes on an interface. By further grouping the processor elements, it is possible to overlap computation and communication. For further details, see the cited references [5 (hyper-linked)], [10,11].


1.5 Performance

The computationally intensive parts are the element calculations, the matrix-vector products, and the dot products. The element calculations can be made in parallel and run on the T3D at 24 Mflops/s per processor. The EBE scheme circumvents global assembly and recasts the global matrix-vector products at the element level. This implies that we make local "almost dense" element matrix-vector products, for the element's interior to any given processor subdomain. Load balance is achieved if the number of elements per processor subdomain is approximately constant.

Clearly, the BLAS2 matrix-vector routine is heavily used (requiring more than 80% of the CPU time for a typical calculation) [11]. In our optimized code, this runs at 34 Mflops/s for a single processor on the T3D. Near linear speedup and scalability on 512 nodes of the T3D at 96.7% were achieved in this manner for the test problem described later.


2. Evaluation Criteria and Performance

The norm for evaluating the speed of the code was specified under the NASA contract in terms of Gigaflops/s with a minimum sustained speed of 10 Gigaflops/s to be attained on the representative 3D coupled problem as the first major performance milestone. This goal was met in March 1997 and verified independently at NASA Goddard, where the performance was assessed to exceed 16 Gigaflops/s on 512 nodes of the NASA Goddard Cray T3D.


3. MGFLO Software Package
3.1 Code Modules


The Code is written in C and Fortran. Communications between processors is achieved using MPI and the Cray shared memory library (shmem). The software consists of the following modules.
(1) A simple 3D mesh generator (IJK type) for hexahedral elements and a similar processor partitioning strategy. That is, the connectivity table is also provided by the module.
(2) A solution manager module which handles the options specified in the input file needed to compute the element matrices and vectors for the discretized problem. At each timestep, or nonlinear iteration cycle, the solution manager calls the routine to form the element matrices and vectors and applies the essential boundary conditions. The solver module is then called.
(3) The solver module consists of element-by-element matrix vector products with overlapped communications and associated dot products, also recast at the element level. A special masked componentwise vector product is introduced in this latter calculation.
The interface between the mesh generator and the other two modules is a "struct" in C which specifies the nodes coordinates, boundary conditions and connectivity tables.
The structure of the main program then is:

  
Standard Headers
Read Title Routine
Read in materials
Flop Rate Routine
main(int argc, char **argv)

  Main Definitions
  Initializations
  Parse Command Line Options
  Find Base Name of Input File
  Create Output File and Open It
  Read Input File into Memory
  
  done = FALSE;
  while (!done)

      Get Next Parsed Line from Input File
      Material Properties
      Mesh 
      Solve 
      Flop Rate
      Write Output Files
      Stop 
      done = TRUE;
      
  Finishing Program

Each line of this structure is a module. We will focus on the "Mesh" and "Solve" modules. The structure of the "Mesh" module is the following:

 Form elements routine
 Set Matrix Type
MESH *rdmesh(INPUT_CTX *input_ctx, DATAITEM **line, int num_fields, PROP *prop)

   Initializations
   Parse Input Data for Nodes and Elements
   Check for Valid Input
   Compute Spacing, Array Sizes and Allocate
   Partition Processors
   Local Position on this Processor
   Compute Nodes Coordinates
   Form Elements
   Store Boundary Conditions
   Store in [[MESH]] Context

The [[MESH]] context contains the information which is passed to the solution manager module named "Solve". In the current version of the code, the [[MESH]] struct contains the following information (in the file "mesh.c")
  NODES*     nodes;             /* A 1-D list of nodes on this processor */
  VectorObj* frame_elements;    /* The elements on the processor boundary */
  VectorObj* pict_elements;     /* the interior elements */
  VectorObj* all_elements;      /* all elements */
  VectorObj* group_info;        /* BC group information */
  VectorObj* sndLists;          /* The send lists to all neighbor procs */
  int        nodes_max;         /* The number of nodes on this processor */
  int        numvar;            /* number of variables */
  int        nStfSize;          /* size of the matrix */
  REAL*      gaussp3d;          /* 3D Gauss points coordinates xi,eta,zeta*/
  REAL*      wght3d;            /* 3D weights of the Gauss points */
  REAL*      gaussp2d;          /* 2D Gauss pnts coord */
  REAL*      wght2d;            /* 2D Gauss pnts weights */
  REAL*      phi;               /* Shape functions tri quads at nodes */
  REAL*      psi;               /* Shape func 3 linear at corner nodes*/
  REAL*      dphi;              /* Deriv xi,eta,zeta of shape func phi */
  REAL*      dpsi;              /* Deriv of shape func psi */
The module "Solve" refers to all the members of the [[MESH]] structure. It is therefore important that the correct interface is built if a new mesh generation routine is used. The structure of the solution manager "Solve" module is as follow:

void solnMgr(MESH *mesh, PROP *prop, PROBLEM *problem, INPUT_CTX
              *input_ctx, DATAITEM **line, int num_fields, SOLVE** solvep)

   local variables
   Initialize
   Parse command line
   Decide How Many Unknowns
   Initialize solution, Gauss, Shape, Deriv
   Setup solver routines
   for (;;) 

       Get Next Parsed Line
       Reset all state variables to initial state
       Parse User Input Line
      
      done = FALSE;
      while (!done)

           Increment solution counters
           Handle decoupled solution
           Save old soln
           Predict solution (or copy [[unold]] to [[unp]])
           Form element stiffness matrix, right hand side and apply
                       boundary conditions's
           Modify solver routines
           A single solve
           Update the solution with new solution if necessary
           Check solution against exact solution
           Iteration control


The interface between the "Solve" routine and the biconjugate gradient routine is via the [[SOLVE]] struct, which contains the following information:
  REAL      eps;          /* tolerance level for the residual norm */
  Vector*   soln;         /* pointer to the solution */  
  int       exact;        /* flag if the exact solution is known */
  int       soln_init;    /* initial solution if known */
  int       itermax;      /* maximum number of iteration in the solver */
  int       iprint;       /* iprint > 1 prints iterations */
  PCG*      pcg;          /* pointer to the solver function: BCG */
  PCG*      pcg_t;        /* pointer to thermal solver if decoupling*/
  PCG*      pcg_f;        /* pointer to the Navier Stokes solver if needed*/

3.2 List of Routines


We list the main routines, by files. The file "mgflo.c" contains the main routine. In this file, the following routines are also included:
void rdtitle(DATAITEM **line, int num_fields): this routine reads the title line of the input file.

void debug_requests(DATAITEM **line, int num_fields): this routine flags the debugging if requested.

void flopRate(DATAITEM **line, int num_fields, PROBLEM* problem, SOLVE* solve, REAL64 diffTime): this routine prints out the number of flops per main routine, and overall Gigaflops/s in the output file.

The file "mesh.c" contains the following routines:

void form_elements(VectorObj* elmv,int xelm, int yelm, int zelm, int i0, int j0, int k0, int nnx, int nny, PROP* prop, int visualizer): provides the node coordinates of the elements for the IJK grid, and connectivity table on one processor. The partitioning was previously done in the main routine in the file "mgflo.c".

MESH *rdmesh(INPUT_CTX *input_ctx, DATAITEM **line, int num_fields, PROP *prop): creates the send lists which are the picture and "frame" of the grid on one processor. (The elements at the surface of the grid of one processor, which share nodes with other processors or which have nodes on the boundary of the physical domain are in the "frame". The interior nodes are in the "picture".)

The file "solnMgr.c" contains the routines relevant for the solution manager:

int matvec(MVTYPE type, void* amat, void* p, void* w, REAL64* cnt): sets the matrix vector BLAS2 routines: either normal or transpose.

int average(void* amat, void* p, void* maskv): performs some averaging of the solution.

void zeroPresMask(MESH* mesh, int numvar, Vector* mask): sets the pressure mask value to zero.

int sumRHS(MESH* mesh, Vector* f, REAL64* cnt): sums the right hand side globally over all processors.

int formDiagInv(MESH* mesh, Vector* diagInv, REAL64* cnt): forms the Jacobi preconditioning matrix.

void errorNorm(DATAITEM** user, int numvar, int nodesMax, NODES* nodes, REAL* u, REAL* mask, REAL** unormp, REAL** diffsln): computes the error norm when the exact solution is known.

int stop(void* uservp, int iter, REAL rnorm, REAL fnorm, REAL zeta, int exact, int iprint, void *soln, void *mask): sets the stopping criteria.

int messageSetup(VectorObj* sndLists, int numvar): sets the messages.

Vector* formMask(VectorObj* sndLists, int nodesMax, int numvar, PCG* pcg): forms the masking vector for the masked dot products.

void solnMgr(MESH *mesh, PROP *prop, PROBLEM *problem, INPUT_CTX *input_ctx, DATAITEM **line, int num_fields, SOLVE** solvep): is the general routine of the solution manager module.

The file "setbc.c" contains the routine which computes the boundary conditions:

void setbc(PROBLEM* problem, ELEMENTS* elm, NODES* nodes, VectorObj* group_info, int nStfSize, int* gp2d,int* gp3d, REAL* wght2d, REAL* gaussp2d, REAL* phi, PROBtype probtype, REAL* xyz, REAL* T_lag, REAL64* rcnt)

The file "pcg.c" contains the routines which set the parallel iterative solver.

int createPCG(PCG **pcg, void* matrix, PCGSolverType solverType): creates the parallel solver pointer "pcg".

int setMatvecPCG(PCG *pcg, int (*matvec)(MVTYPE, void* ,void* ,void* ,REAL64 *)): sets the matrix vector function pointer which is used by solver.

int setAveragePCG(PCG *pcg, int (*average)(void* ,void* ,void* )): sets the average function pointer specified in the solution manager.

int setParmPCG(PCG* pcg, REAL eps, int iterMax, int iprint, int exact, int solnInit): sets the parameters for the solver.

int setMaskPCG(PCG* pcg, void *mask): sets the proper masking parameters for the solver.

int clearMaskPCG(PCG* pcg): clears the masking vector.

int setPrecondPCG(PCG *pcg, int (*pcroutine)(), void* precond): sets the proper preconditioning routine for the solver chosen.

int setStoppingPCG(PCG* pcg, int (*stop)(void*, int, REAL, REAL, REAL, int, int, void*, void*), void* user): sets the stopping criterion for the solver.

int vecGetWork(PCG* pcg, int n): prepares the working vectors for the solver.

int solvePCG(PCG *pcg, void* soln, void* f, int* its, REAL64* cnt): specifies the type of solver (CG or BCG).

The routines for the conjugate gradient solver and biconjugate gradient solver are provided in the file "iter.c".

int cgSetup(PCG *pcg): sets the CG solver.

int cg(PCG* pcg, int* its, REAL64* cnt): CG solver routine.

int bcgSetup(PCG *pcg): sets the BCG solver.

int bcg(PCG* pcg, int* its, REAL64* cnt): BCG solver routine.

The file "serv.c" contains the server routines: message, error message, and warning message.

The file "vec.c" contains the routines which handle vector operations, and blas routines.

The file "user.c" contains routines which are specified by the user for the variable boundary conditions, the initial boundary conditions, and the exact solution. These routines are discussed in section 3.3 below.

The file "formkf.f" contains the following routines written in Fortran. These routines are:

subroutine surften(c,T_lag,iface,gp2d,gp3d, phi,wght2d,gaussp2d, xyz,rhs): computes the boundary integral due to the surface tension boundary condition.

subroutine fluxbc(c,iface,gp2d,gp3d,phi,wght2d,gaussp2d,xyz,rhs): computes the thermal heat flux boundary integral when a heat flux boundary condition is specified.

subroutine shapeq2q1(ngpts,gaussp3d, phi, psi,dphi,dpsi): computes the Q2 (triquadratic) and Q1 (trilinear) shape functions and their derivatives.

subroutine gauss(npoints,gaussp2d, wght2d,gaussp3d,wght3d): computes the coordinates of the Gauss points in the master element coordinate frame.

subroutine formkf_ns(discrtype,ngpts,gaussp3d,wght3d,phi,psi, dphi,dpsi,xyz,u_lag,T_lag,prop,delt,solntyp,stf,rhs,rcount): computes the element stiffness matrix and its corresponding right hand side vector from the finite element discretized formulation of the Navier-Stokes system of equations.

subroutine formkf_t(discrtype,ngpts,gaussp3d,wght3d,phi,psi, dphi, dpsi,xyz,u_lag,T_lag,prop,delt,solntyp,stf,rhs,rcount): computes the element stiffness matrix, and its corresponding right hand side vector from the finite element discretized formulation of the thermal equation.

3.3 Sample Input File


The software requests an input file which uses a set of chosen keywords. The software copies the input file to all processors, which then parse the lines of the input file as the computation progresses. A sample input file is provided below for the current version of the software and this is followed by a detailed explanation of this input data.

 title, " Test case for mgflo"
 material,rho=1.0,k=1.0,cp=1.,nu=1.,beta=0.,iprint=0
 mesh,elem=48,48,48,min=0.0,0.0,0.0, max=1.0,1.0,1.0, partition=8,8,8
   group=6,T=variable,iset=2
   group=0,ux=0.0
   group=0,uy=0.0
   group=0,uz=0.0
   group=1,ux=variable, iset=1 
   group=1,uy=0.0         
   group=1,uz=0.0
   group=2,ux=variable, iset=1
   group=2,uy=0.0
   group=2,uz=0.0
   face=0,group=6
   face=0,group=0
   face=1,group=6
   face=1,group=1
   face=2,group=6
   face=2,group=2
   face=3,group=6
   face=3,group=1
   face=4,group=6
   face=4,group=2
   face=5,group=6
   face=5,group=0
 end,mesh
 solve,decouple,iprint=1,user=dparabolic
   steady,nsteps=1,eps=1.0E-6,itermax=3000
 end,solve
 flop,iprint=3
 stop
 


The above input file is composed of three main sections.
(1) The first section, line 3, gives the "material" data, values of the properties of the fluid which are needed to perform the computation (in cgs units here) (density "rho", thermal conductivity "k", constant pressure specific heat coefficient "cp", kinematic viscosity "nu", thermal expansion coefficient "beta"). If no values is given the default value of 0 is assigned.
(2) The second section, (from line 4 to "end mesh") provides information on the mesh, and the partitioning among processors. The keyword "elem" indicates the number of elements which are set in the "x", "y", and "z" direction of the structured hexahedral grid; the keyword "min" gives the coordinates of the bottom left front corner of the domain, and the keyword "max" gives the coordinates of the upper back right corner of the domain; the keyword "partition" provides the number of processors assigned in each of the "x", "y", "z" directions. The flow domain for this example has six faces, which are assigned different "group" boundary conditions. A "face" may be assigned several "groups" since several boundary conditions may apply to it.
(3) The third section of the input file provides data used for the solver. It indicates which type of solver will be applied: "decouple" if the Navier Stokes and the thermal equations are iteratively decoupled; "flow" if only the Navier Stokes equations are solved, "thermal" if only the thermal equation is solved. We also allow the user to provide a special set of boundary conditions and initial conditions, when the keyword "user" is set to a string. This string refers to sections of the file "user.c", which are discussed below. The keyword "steady" indicates a stationary flow calculation, "transient" indicates that the time dependent terms of the equations are included, with a timestep specified by the keyword "delt". The number of cycles or timesteps is set by "nsteps". The precision of the biconjugate iterative solver is given by "eps", and the maximum number of iterations in this iterative solver is provided by "itermax".
We note that on several lines, the keyword "iprint" is used: this provides various output information. The order of the lines should be respected, yet on a given line, the order of the keywords is not important.


For the example above, we have therefore the following information: the material density, thermal conductivity, viscosity, and heat coefficient are set to 1. The mesh is a uniform 48 by 48 by 48 structured grid, for the unit cube. The number of processors requested is 512 in the form of an 8 by 8 by 8 processor partition of subdomains. The boundary conditions on the unit cube indicate that the bottom faces have a variable temperature (group 6), specified in the routine "user.c" as the set of conditions "iset" = 2, of the section "dparabolic". The bottom and top surfaces (face=0,5) satisfy the noslip condition (provided by the group 0). The lateral faces (face=1,2,3,4) have a variable "x" component of the velocity, specified in the routine user.c, by the set "iset=1" in the section "dparabolic", and have "y" and "z" velocity components set to zero.
Based on this data input, the program solves the Navier Stokes and thermal equations in a decoupled manner, with the specific boundary conditions provided by the section "dparabolic". For the sample data set here, no time dependency is implemented, and only one nonlinear cycle is computed. The precision of the bi-conjugate gradient solver is set to 1.e-06, and a maximum of 3000 iterations is requested. Finally, the output file will print the overall Mflop rate for the code.
We remark that upon completion of a flow simulation, the software provides an output file, and, if chosen so, data files to allow visualization of the mesh and the solution.

3.4 Sample Output File


The output file repeats a copy of the input file, and provides information corresponding to the integer value assigned to the "iprint" keyword in the input files. If the statement "iprint=0" is given, or if the keyword "iprint" is omitted, then no information is printed.
On the "material" input line, "iprint=1" reprints the material properties.
On the "mesh" input line, "iprint=1" prints the mesh information (coordinates, connectivity table) in the output file.
On the "solve" input line, "iprint=1" gives the number of iterations used in the iterative solver to converge to the tolerance "eps"; "iprint=2" prints the residual at each solver iteration.
On the "flop" input line, "iprint=1" prints the overall Gigaflops rate, and wallclock time spent in the execution of the program; "iprint=2" gives also the Gigaflops rates and wallclock times of the solver, the formation of the elements stiffness matrix, and boundary conditions routine; finally, setting "iprint=3" shows the Gigaflops rate of the thermal equation solve, and of the Navier Stokes equations solve. An example of the output data for the input instruction flop, iprint=3 is given in section 3.6.

3.5 User specified conditions.


The file "user.c" allows the user to specify a special set of initial conditions, variable boundary conditions, or an exact solution, in various routines. The conditions are called by a string, specified in the input file. The following shows how the boundary conditions with a variable x component of the velocity, corresponding to the string "dparabolic" are coded in the routine "varibc" of the file "user.c":

 void varibc(DATAITEM **user, REAL curtime, int iset, REAL x, REAL y, REAL z,
	    REAL *value)
...
  else if (STRCASECMP(user[0]->cdata, ==, "dparabolic"))
    {
      *value = z*(1.-z);
    }
...

3.6 Usage


Steps for compiling MGFLO:
The tar file mgflo.tar contains all necessary files to compile and run the program. Once the tar file has been expanded, the steps to compile it are:
a) cd Csupes
b) /bin/make
c) cd ..
d) /bin/make
The warning messages can safely be ignored.
Steps To Run MGFLO:
The tar file includes an input file `ns.dat". The qsub script file mgflo.qsub can be submitted after changing line 3 to give the appropriate directory that the tar file is in. The output will be written to the file `ns.out".
The last 30 lines of the output file report the performance of the code. Below is a copy of the end of the output file.

_____________________________________________________________________________
Time in solve = 651.354 sec
Overall GigaFlop  Rate:  16.543, GFLOP Cnt:  1.078e+04,      Time:        652
1 Proc Solve  MFLOPS/s:  32.385,  solveCnt:  2.103e+10, solveTime:        649
1 Proc Fkf    MFLOPS/s:  19.670,    fkfCnt:  3.268e+07,   fkfTime:       1.66

FLOW:
1 Proc MatVec MFLOPS/s:  33.293,     mvCnt:  2.033e+10,    mvTime:        611
1 Proc Precon MFLOPS/s:  13.195,     pcCnt:  5.107e+07,    pcTime:       3.87
1 Proc Dot    MFLOPS/s:  11.598,    dotCnt:  2.298e+08,   dotTime:       19.8
1 Proc AXPY   MFLOPS/s:  31.556,   axpyCnt:  2.297e+08,  axpyTime:       7.28
1 Proc AXPBY  MFLOPS/s:  40.080,  axpbyCnt:  7.653e+07, axpbyTime:       1.91

THERMAL:
1 Proc MatVec MFLOPS/s:  19.756,     mvCnt:  1.015e+08,    mvTime:       5.14
1 Proc Precon MFLOPS/s:  12.165,     pcCnt:  6.613e+05,    pcTime:     0.0544
1 Proc Dot    MFLOPS/s:   9.847,    dotCnt:  2.979e+06,   dotTime:      0.303
1 Proc AXPY   MFLOPS/s:  29.274,   axpyCnt:  2.955e+06,  axpyTime:      0.101
1 Proc AXPBY  MFLOPS/s:  36.211,  axpbyCnt:  9.821e+05, axpbyTime:     0.0271

Time in flop = 0.00450909 sec

Time in stop = 1.99974E-05 sec
Time in PROGRAM: 651.802173
_____________________________________________________________________________

The "Overall GigaFlop Rate" line will report the total performance of the code summed across all processors and then reports that rate, in this case 16.543 GFLOPS.

4. Test Problems and Performance Results
Two test examples are provided below to demonstrate the performance results of the code on the NASA Goddard T3D, and T3E systems.
The first test is stationary solution of the Navier Stokes equations and decoupled solution of the thermal equation. The flow solution corresponds to a simple Poiseuille flow calculation, and the temperature solution is simple linear heat conduction. The exact solutions lie in the approximation subspace and are obtained by MGFLO. The input file is provided above in section 3.3. Essentially linear speedup is obtained for configurations up to 512 nodes tested on the T3D, with 96.7% scaling (see Figure 1).


Figure1: Speedup scaling.


The second test case is a Rayleigh- Benard- Marangoni type of flow including the surface tension and buoyancy contribution. The bottom temperature is set to 62.64 C, the reference temperature is 60C. The free surface is flat. The thickness of the layer is 0.45 mm, the surface section of the fluid is 1 cm by 1 cm. The velocity at the walls is set to 0. The side walls are insulated. The temperature field is specified on the free surface as shown on Figure 2. The code solves the Navier Stokes equations and the heat transfer equation. The velocity vectors at the free flat surface are shown in Figure 3. The surface velocity vectors indicate that vortices are established due to the variations of the surface temperature fields, and surface tension.


Figure 2: Temperature variations on free surface.

Figure 3: Velocity vectors on free surface.

5. Experimental Studies
Laboratory experiments are now in progress to investigate several aspects of surface-tension-driven convection. We have identified the primary instability in high Prandtl number liquid bridge convection and have used advanced, nonlinear control algorithms to prevent onset of instability. Liquid bridge convection is being studied now in low Prandtl number liquids, where the industrial interest is greatest, and onset to instability has been observed. Pattern formation within the ``dry spots'' of long-wavelength Benard convection is being examined for the appearance of hexagonal symmetries. Finally, we are beginning liquid-liquid convection experiments, where a rich variety of instabilities--long- and short-wavelength, stationary and oscillatory--are predicted for both heating from below and heating from above.

6. Conclusions and Recommendations


This research project was initiated in Summer 1996 under the NASA ESS CAN program and software development began in Fall 1996. The prototype parallel code MGFLO is now available for applications and testing on microgravity flow problems. The modular structure of the code permits expansion of its capabilities and several enhancements are planned as the research program continues. For example, we will continue to focus on code optimization and performance milestones in the next year. At the same time, we will develop a fully-coupled formulation and investigate techniques for treating more general boundary conditions and geometry. Now that the prototype code is complete, we plan to work more closely with the flow experiment component of the program and focus on free surface physics. The basic approach of parallelization via domain decomposition, overlap of communications and computations, use of high performance BLAS1 and BLAS2 kernels for the EBE computations will be carried forward as the code capabilities are enhanced. Although the benchmark tests described to date have focussed on a rectangular solid geometry, and 3D cartesian grid, the code is not limited in this manner. It is designed to permit easy extension to more general geometries and meshes, and incorporation of the more general strategies such as those obtained using METIS and similar programs.

Acknowledgments This project was funded by NASA ESS CAN NCCS5 154.

Glossary


thermocapillary: surface tension effects associated with temperature variations on the fluid free surface.


R-B-M: Rayleigh- Benard- Marangoni; thermally driven flows due to both buoyancy and thermocapillary effects.


finite elements: approximation technique involving weighted integral formulation of equations on approximation subspaces.


BCG: Biconjugate gradient method, one of a class of generalised gradient iterative solvers suited for nonsymmetric algebraic systems.

MGFLO: Microgravity Flow Software Package.

EBE: Element-by-element formulation that circumvent system assembly.

BLAS: Basic Linear Algebra Subroutine, assembly coded to optimized performance.

References