Scalable Parallel Finite Element Computations of
Rayleigh-Benard-Marangoni Problems
in a Microgravity Environment

Project Director: Dr. G. F. Carey, CFDLab,
The University of Texas at Austin.


Contents:
  1. Rayleigh-Benard-Marangoni Flows
    1. Microgravity Flow Problems
    2. Mathematical Model and Finite element Method
    3. Solution Algorithm
    4. Parallelization
    5. Performance
  2. Evaluation Criteria and Performance
  3. MGFLO Software Package
    1. Code Modules
    2. List of Routines
    3. Sample Input File
    4. Sample Output File
    5. User specified conditions
    6. Usage
  4. Test Problem and Performance Results
  5. Experimental Studies
  6. Conclusions and Recommendations


Authors
Dr G.F. Carey: project director, professor in Aerospace Engineering at the University of Texas at Austin, director of the CFDLab.
Dr R. McLay: Research Associate at the University of Texas at Austin.
Dr. Gurcan Bicken: Post Doctoral associate, CFDLab.
Dr. H Swinney: Professor at the University of Texas at Austin, director of the Center for Nonlinear Dynamics.
Dr. S. Van Hook: Post Doctoral associate, Center for Nonlinear Dynamics.

Abstract


MGFLO is designed for parallel distributed computation of 3D coupled viscous flow and heat transfer, including thermocapillary surface tension effects. Of specific interest are microgravity fluid problems of interest to NASA for life support fluid processing and storage, and manufacturing applications in the Aerospace Shuttle and Space Station Missions. The project involves development of software for flow simulations on the Cray T3D and Cray T3E distributed MPP systems, as well as fluid flow experiments on fluid films where the Marangoni effect is dominant. The first generation MGFLO code achieved a sustained 16.5 Gflops/s performance level on the 512 node T3D system at NASA Goddard in March 1997, for a representative 3D coupled Rayleigh- Benard- Marangoni benchmark calculation. An improved version of the code achieved 51.66 Gflops/s on the 512 T3E-600 system at NASA Goddard in July 1997; the first generation code achieved 115 Gflops/s on a 1280 T3E-900 system at Cray Research Inc in July 1997. A brief description of the problem class, methodology, performance measurement criteria, and software package is provided in this document. References and publications from our team relevant to the research done with this software are linked to this web page.

1. Rayleigh- Benard- Marangoni Flows
1.1 Microgravity Flow Problems

Benard in 1900 observed that cellular flow structures formed under certain conditions where a fluid layer was heated from below [1]. He attributed the behavior to the effect of thermal expansion and associated buoyancy forces in the fluid, and assumed that there were no significant effects associated with the free surface. Subsequently, Rayleigh in 1916, developed a mathematical model for this problem under the same assumptions and carried out a stability analysis. This class of problems is commonly referred to as the Rayleigh- Benard Problem.

Neither Rayleigh nor Benard realized that, in fact, thermocapillary surface tension at the free surface was the dominant mechanism driving the flow for this class of terrestrial experiments involving thin layers. This problem is termed the Rayleigh- Benard- Marangoni problem [2]. In fact, it is the dominance of surface tension over gravity that makes this problem so important to understanding the behavior of fluid in a microgravity environment [3]. Clearly, in NASA shuttle and space station orbit, gravity is negligible, and surface tension effects are paramount. However, it is generally not possible to carry out terrestrial experiments to test ideas before deployment in space (the one exception being the thin layer type of experiments conducted as part of the present work). Hence a simulation capability for Rayleigh- Benard- Marangoni problems is essential.

Since these problems are nonlinear 3D, coupled and require fine grid resolution for stable, reliable simulations, the problem is both computationally and memory intensive. Table 1 gives an estimate of solution times and memory requirements for a typical Rayleigh- Benard- Marangoni calculation using MGFLO on different Cray vector systems, and on the NASA Goddard MPP systems.


Computer
J90
1 proc
C90
1 proc
T90
1 proc
T3D
512 proc
T3E
512 proc
MGFLO Flops/s 73 M 290 M 600 M 16.5 G 51.6 G
Time: 9000 elements
(Memory: 128 MW)
9 days
2.3 days
1.1 day
1 h
20 mn
Size: 110582 elements
(Memory: 1540 MW)
Out of
Memory
Out of
Memory
Out of
Memory
In
Memory
In
Memory


Table: MGFLO time and memory comparison and estimates.


1.2 Mathematical Model and Finite Element Method

The model consists of the Navier-Stokes equations for flow in 3D, coupled with the energy transport equation for temperature. Under the Boussinesq approximation, buoyancy enters as a body force in the momentum equations, and a thermocapillary surface tension effect enters through a shear stress condition at the free surface [see 12(postscript hyperlinked), 13].

A weak variational saddle point formulation of the viscous flow and heat transfer problem is constructed, and a Galerkin finite element approximation statement follows with the free surface condition incorporated as a natural boundary condition [4]. The discretization is taken as tensor-product hexahedral elements with triquadratic basis for each velocity component, trilinear basis for the pressure, and triquadratic basis for temperature (see online papers linked to references [5,6]).


1.3 Solution Algorithm


Two algorithms have been implemented: 1) the first approach consisted of an iteratively decoupled treatment of the respective fluid flow and heat transfer equations and successive approximation. At any given iterate the computed velocity field from the previous iteration is used in the convective heat transfer contributions and the subsequent temperature solution is used to compute the buoyancy and free surface contribution to the discretized momentum equations. Biconjugate gradient iteration is used to solve each of the respective linear subsystems. The respective element matrices are of size 27 27 for the temperature equation and 89 89 for the momentum equations. 2) The second approach corresponds to a fully-coupled formulation for velocity components, pressure and temperature. The nonlinear system at each step is solved using Newton iteration with biconjugate gradient solution of the associated jacobian systems. In this coupled formulation the resulting element matrix contributions to the jacobian matrices are of size 116 116. This leads to a slight improvement in matrix-vector product efficiency. Further details are provided in [7,8].


1.4 Parallelization


The problem is solved in parallel over a partition of subdomains using an element-by-element strategy for the BCG solver within each subdomain [9]. That is, the mesh of hexahedral elements is partitioned to subdomains containing contiguous blocks of elements. Each such subdomain is assigned to a processor so this defines the processor partitioning. The subdomain partition is non-overlapping insofar as elements are concerned, but nodes, edges and faces on the subdomain interfaces are shared by adjacent processors. Communication between adjacent processors will be required for calculations involving those elements with common nodes on an interface. By further grouping the processor elements, it is possible to overlap computation and communication. For further details, see the cited references [5 (hyper-linked)], [10,11].


1.5 Performance

The computationally intensive parts are the element calculations, the matrix-vector products, and the dot products. The element calculations can be made in parallel and run on the T3E at 108 Mflops/s per processor. The EBE scheme circumvents global assembly and recasts the global matrix-vector products at the element level. This implies that we make local "almost dense" element matrix-vector products, for the elements interior to any given processor subdomain. Load balance is achieved if the number of elements per processor subdomain is approximately constant.

Clearly, the BLAS2 matrix-vector routine is heavily used (requiring more than 80% of the CPU time for a typical calculation) [11]. In our optimized code, this runs at 108 Mflops/s for a single processor on the T3E. Near linear speedup and scalability on 512 nodes of the T3E at 93.4% were achieved in this manner for the test problem described later.


2. Evaluation Criteria and Performance

The norm for evaluating the speed of the code was specified under the NASA contract in terms of Gigaflops/s with a minimum sustained speed of 50 Gigaflops/s to be attained on the representative 3D coupled problem as the performance milestone for February 1998. This goal was met in July 1997 where the performance was assessed to exceed 51.6 Gigaflops/s on 512 nodes of the NASA Goddard Cray T3E, and in excess of 115 Gflops/s on a 1280 nodes T3E at Cray Research Inc in July of 1997.


3. MGFLO Software Package
3.1 Code Modules


The Code is written in C and Fortran. Communications between processors is achieved using MPI and the Cray shared memory library (shmem). (Shmem is used in situations where communication speed critically impacts performance). The software consists of the following modules:
(1) A simple 3D mesh generator (IJK type) for hexahedral elements and a similar processor partitioning strategy. That is, the connectivity table is also provided by the module. (Note that the implementation is not limited to cartesian meshes and is designed to handle more general configurations.
(2) A solution manager module which handles the options specified in the input file needed to compute the element matrices and vectors for the discretized problem. At each timestep, or nonlinear iteration cycle, the solution manager calls the routine to form the element matrices and vectors and applies the essential boundary conditions. The solver module is then called.
(3) The solver module consists of element-by-element matrix vector products with overlapped communications and associated dot products, also recast at the element level. A special masked componentwise vector product is introduced in this latter calculation.
The interface between the mesh generator and the other two modules is a "struct" in C which specifies the nodes coordinates, boundary conditions and connectivity tables.
The structure of the main program then is:

  
Standard Headers
Read Title Routine
Read in materials
Flop Rate Routine
main(int argc, char **argv)

  Main Definitions
  Initializations
  Parse Command Line Options
  Find Base Name of Input File
  Create Output File and Open It
  Read Input File into Memory
  
  done = FALSE;
  while (!done)

      Get Next Parsed Line from Input File
      Material Properties
      Mesh 
      Solve 
      Flop Rate
      Write Output Files
      Stop 
      done = TRUE;
      
  Finishing Program

Each line of this structure is a module. We will focus on the "Mesh" and "Solve" modules. The structure of the "Mesh" module is the following:

Form elements routine
Set Matrix Type
MESH *rdmesh(INPUT_CTX *input_ctx, DATAITEM **line, int num_fields, PROP *prop)

   Initializations
   Parse Input Data for Nodes and Elements
   Check for Valid Input
   Compute Spacing, Array Sizes and Allocate
   Partition Processors
   Local Position on this Processor
   Compute Nodes Coordinates
   Form Elements
   Store Boundary Conditions
   Store in [[MESH]] Context

The [[MESH]] context contains the information which is passed to the solution manager module named "Solve". In the current version of the code, the [[MESH]] struct contains the following information (in the file "mesh.c"):
  NODES*     nodes;             /* A 1-D list of nodes on this processor */
  VectorObj* frame_elements;    /* The elements on the processor boundary */
  VectorObj* pict_elements;     /* the interior elements */
  VectorObj* all_elements;      /* all elements */
  VectorObj* group_info;        /* BC group information */
  VectorObj* sndLists;          /* The send lists to all neighbor procs */

  int        elm_max;           /* The number of elements on this processor */
  int        elm_max_g;         /* The maximum global number of elements */
  int        maxElmPP;          /* The maximum nr. of elements per processor */
  int        constNPP;          /* if true then the number nodes per
                                   processor are the same. */
  int        constEPP;
  int*       elmSize;
  int        nodes_max;         /* The number of nodes on this processor */
  int        nodes_max_g;       /* The maximum global number of nodes */
  int        maxNodesPP;        /* The maximum nodes per processor */
  int*       nodesSize;         /* Array of nodes on each processor */
  int        baseNum;           /* base number associating the local
                                   proc nodes numbering to
                                   global node numbering */
  int        numvar;            /* number of variables */
  int        nStfSize;          /* size of the matrix */
  int        gp1d;
  int        gp2d;
  int        gp3d;
  REAL*      gauss3d;           /* 3D Gauss points coordinates xi,eta,zeta*/
  REAL*      wght3d;            /* 3D weights of the Gauss points */
  REAL*      gauss2d;           /* 2D Gauss pnts coord */
  REAL*      wght2d;            /* 2D Gauss pnts weights */
  REAL*      phi;               /* Shape functions tri quads at nodes */
  REAL*      psi;               /* Shape func 3 linear at corner nodes*/
  REAL*      dphi;              /* Deriv xi,eta,zeta of shape func phi */
  REAL*      dpsi;              /* Deriv of shape func psi */
  REAL*      en;                /* 2D shape function */
  REAL*      dn;                /* 2D Shape function deriv */
The module "Solve" refers to all the members of the [[MESH]] structure. It is therefore important that the correct interface is built if a new mesh generation routine is used. The structure of the solution manager "Solve" module is as follows:

void solnMgr(MESH *mesh, PROP *prop, PROBLEM *problem, INPUT_CTX
            *input_ctx, DATAITEM **line, int num_fields, SOLVE** solvep)

  Local Variables
  Initialize
  Parse command line
  Decide How Many Unknowns
  Initialize solution, Gauss, Shape, Deriv
  Setup solver routines

  for (;;)  /* Loop over User Requests (Until "END" is found)*/
  {
    Get Next Parsed Line, [[break]] if [["END"]] is found
    Reset all state variables to initial state
    Parse User Input Line
    /* Loop over number of solution request from current user line.*/
    done = FALSE;
    while (!done) 
    {
      Increment Solution counters
      Save old soln
      Predict solution (or copy old to predicted)
      /* Normally one-pass loop except for decoupled case */
      while (!whole) 
      {
        Handle decoupled solution
        Form element [[stf]], [[rhs]] and apply BCs
        Modify solver routines
        A Single Solve
        Update [[U]] with new solution if necessary
        Toggle between [[FLOW]] and [[THERMAL]] when necessary
      }
      Check Solution against Exact Solution
      Acceleration, Iteration and Time Step Control
    }
  }
  Deallocations

The interface between the "Solve" routine and the biconjugate gradient routine is via the [[SOLVE]] struct, which contains the following information:
  REAL      time;         /* time of the solution for transient cases */ 
  Vector*   soln;         /* pointer containing the solution   */
  int       kstep;        /* the solution count */
  int       numvar;       /* number of unknowns */
  char*     varPool;      /* pointer containing names of variables */
  int       sizePool;     /* size of the varPool pointer */
  PCG*      pcg;          /* pointer to the solver function: BCG */
  PCG*      pcg_t;        /* pointer to thermal solver if decoupling*/
  PCG*      pcg_f;        /* pointer to the Navier Stokes solver if needed*/

3.2 List of Routines


We list the main routines, by files. The file "mgflo.c" contains the main routine. In this file, the following routines are also included:
void rdtitle(DATAITEM **line, int num_fields): this routine reads the title line of the input file.

void debug_requests(DATAITEM **line, int num_fields): this routine flags the debugging if requested.

void flopRate(DATAITEM **line, int num_fields, PROBLEM* problem, SOLVE* solve, REAL64 diffTime): this routine prints out the number of flops per main routine, and overall Gigaflops/s in the output file.

The file "mesh.c" contains the following routines:

void form_elements(VectorObj* elmv,int xelm, int yelm, int zelm, int i0, int j0, int k0, int nnx, int nny, PROP* prop, int visualizer): provides the node coordinates of the elements for the IJK grid, and connectivity table on one processor. The partitioning was previously done in the main routine in the file "mgflo.c".

MESH *rdmesh(INPUT_CTX *input_ctx, DATAITEM **line, int num_fields, PROP *prop): creates the send lists which are the picture and "frame" of the grid on one processor. (The elements at the surface of the grid of one processor, which share nodes with other processors or which have nodes on the boundary of the physical domain are in the "frame". The interior nodes are in the "picture".)

The file "solnMgr.c" contains the routines relevant for the solution manager:

int matvec(MVTYPE type, void* amat, void* p, void* w, REAL64* cnt): sets the matrix vector BLAS2 routines: either normal or transpose.

void zeroPresMask(MESH* mesh, int numvar, Vector* mask): sets the pressure mask value to zero.

int sumRHS(MESH* mesh, Vector* f, REAL64* cnt): sums the right hand side globally over all processors.

int formDiagInv(MESH* mesh, Vector* diagInv, REAL64* cnt): forms the Jacobi preconditioning matrix.

void errorNorm(DATAITEM** user, int numvar, int nodesMax, NODES* nodes, REAL* u, REAL* mask, REAL** unormp, REAL** diffsln): computes the error norm when the exact solution is known.

int stop(void* uservp, int iter, REAL rnorm, REAL fnorm, REAL zeta, int exact, int iprint, void *soln, void *mask): sets the stopping criteria.

int messageSetup(VectorObj* sndLists, int numvar): sets the messages.

Vector* formMask(VectorObj* sndLists, int nodesMax, int numvar, PCG* pcg): forms the masking vector for the masked dot products.

void solnMgr(MESH *mesh, PROP *prop, PROBLEM *problem, INPUT_CTX *input_ctx, DATAITEM **line, int num_fields, SOLVE** solvep): is the general routine of the solution manager module.

The file "setbc.c" contains the routine which computes the boundary conditions:

void setbc(PROBLEM* problem, ELEMENTS* elm, NODES* nodes, VectorObj* group_info, int nStfSize, int* gp2d,int* gp3d, REAL* wght2d, REAL* gaussp2d, REAL* phi, PROBtype probtype, REAL* xyz, REAL* T_lag, REAL64* rcnt)

The file "pcg.c" contains the routines which set the parallel iterative solver.

int createPCG(PCG **pcg, void* matrix, PCGSolverType solverType): creates the parallel solver pointer "pcg".

int setMatvecPCG(PCG *pcg, int (*matvec)(MVTYPE, void* ,void* ,void* ,REAL64 *)): sets the matrix vector function pointer which is used by solver.

int setAveragePCG(PCG *pcg, int (*average)(void* ,void* ,void* )): sets the average function pointer specified in the solution manager.

int setParmPCG(PCG* pcg, REAL eps, int iterMax, int iprint, int exact, int solnInit): sets the parameters for the solver.

int setMaskPCG(PCG* pcg, void *mask): sets the proper masking parameters for the solver.

int clearMaskPCG(PCG* pcg): clears the masking vector.

int setPrecondPCG(PCG *pcg, int (*pcroutine)(), void* precond): sets the proper preconditioning routine for the solver chosen.

int setStoppingPCG(PCG* pcg, int (*stop)(void*, int, REAL, REAL, REAL, int, int, void*, void*), void* user): sets the stopping criterion for the solver.

int vecGetWork(PCG* pcg, int n): prepares the working vectors for the solver.

int solvePCG(PCG *pcg, void* soln, void* f, int* its, REAL64* cnt): specifies the type of solver (CG or BCG).

The routines for the conjugate gradient solver and biconjugate gradient solver are provided in the file "iter.c".

int cgSetup(PCG *pcg): sets the CG solver.

int cg(PCG* pcg, int* its, REAL64* cnt): CG solver routine.

int bcgSetup(PCG *pcg): sets the BCG solver.

int bcg(PCG* pcg, int* its, REAL64* cnt): BCG solver routine.

The file "serv.c" contains the server routines: message, error message, and warning message.

The file "vec.c" contains the routines which handle vector operations, and blas routines.

The file "user.c" contains routines which are specified by the user for the variable boundary conditions, the initial boundary conditions, and the exact solution. These routines are discussed in section 3.3 below.

The file "formkf.f" contains the following routines written in Fortran. These routines are:

subroutine surften(c,T_lag,iface,gp2d,gp3d, phi,wght2d,gaussp2d, xyz,rhs): computes the boundary integral due to the surface tension boundary condition.

subroutine fluxbc(c,iface,gp2d,gp3d,phi,wght2d,gaussp2d,xyz,rhs): computes the thermal heat flux boundary integral when a heat flux boundary condition is specified.

subroutine shapeq2q1(ngpts,gaussp3d, phi, psi,dphi,dpsi): computes the Q2 (triquadratic) and Q1 (trilinear) shape functions and their derivatives.

subroutine gauss(npoints,gaussp2d, wght2d,gaussp3d,wght3d): computes the coordinates of the Gauss points in the master element coordinate frame.

subroutine formkf_ns(discrtype,ngpts,gaussp3d,wght3d,phi,psi, dphi,dpsi,xyz,u_lag,T_lag,prop,delt,solntyp,stf,rhs,rcount): computes the element stiffness matrix and its corresponding right hand side vector from the finite element discretized formulation of the Navier-Stokes system of equations.

subroutine formkf_t(discrtype,ngpts,gaussp3d,wght3d,phi,psi, dphi, dpsi,xyz,u_lag,T_lag,prop,delt,solntyp,stf,rhs,rcount): computes the element stiffness matrix, and its corresponding right hand side vector from the finite element discretized formulation of the thermal equation.

3.3 Sample Input File


The software requests an input file which uses a set of chosen keywords. The software copies the input file to all processors, which then parse the lines of the input file as the computation progresses. A sample input file is provided below for the current version of the software and this is followed by a detailed explanation of this input data.

! This is how a comment is written
title, "Test Case for Mgflo"
material,rho=1.0,k=1.0,mu=1.0,cp=0.0,iprint=0
mesh,elem=64,64,64,min=0.0,0.0,0.0, max=1.0,1.0,1.0, partition=8,8,8,iprint=0
  group=6,T=variable,iset=2
  group=0,ux=0.0
  group=0,uy=0.0
  group=0,uz=0.0
  group=1,ux=variable, iset=1 
  group=1,uy=0.0         
  group=1,uz=0.0
  group=2,ux=variable, iset=1
  group=2,uy=0.0
  group=2,uz=0.0
  group=7,p=0.0
  node=0,0,group=7
  face=0,group=6
  face=0,group=0
  face=1,group=6
  face=1,group=1
  face=2,group=6
  face=2,group=2
  face=3,group=6
  face=3,group=1
  face=4,group=6
  face=4,group=2
  face=5,group=6
  face=5,group=0
end,mesh
solve,coupled, iprint=0,user=dparabolic
  steady,nsteps=1,eps=1.E-10,itermax=3000
end,solve
flop,iprint=3
stop
 


The above input file is composed of three main sections.
(1) The first section, line 3, gives the "material" data, values of the properties of the fluid which are needed to perform the computation (in cgs units here) (density "rho", thermal conductivity "k", constant pressure specific heat coefficient "cp", dynamic viscosity "mu", thermal expansion coefficient "beta"). If no values is given the default value of 0 is assigned.
(2) The second section, (from line 4 to "end mesh") provides information on the mesh, and the partitioning among processors. The keyword "elem" indicates the number of elements which are set in the "x", "y", and "z" direction of the structured hexahedral grid; the keyword "min" gives the coordinates of the bottom left front corner of the domain, and the keyword "max" gives the coordinates of the upper back right corner of the domain; the keyword "partition" provides the number of processors assigned in each of the "x", "y", "z" directions. The flow domain for this example has six faces, which are assigned different "group" boundary conditions. A "face" may be assigned several "groups" since several boundary conditions may apply to it. Surface tension is given as surface boundary condition.
(3) The third section of the input file provides data used for the solver. It indicates which type of solver will be applied: "decouple" if the Navier Stokes and the thermal equations are iteratively decoupled; "coupled" if the Navier Stokes and thermal equations are solved in a coupled manner; "flow" if only the Navier Stokes equations are solved, "thermal" if only the thermal equation is solved. We also allow the user to provide a special set of boundary conditions and initial conditions, when the keyword "user" is set to a string. This string refers to sections of the file "user.c", which are discussed below. The keyword "steady" indicates a stationary flow calculation, "transient" indicates that the time dependent terms of the equations are included, with a timestep specified by the keyword "delt". The number of cycles or timesteps is set by "nstep". The precision of the biconjugate iterative solver is given by "eps", and the maximum number of iterations in this iterative solver is provided by "itermax".
We note that on several lines, the keyword "iprint" is used: this provides various output information. In general, the order of the lines should be respected, yet on a given line, the order of the keywords is not important after the first word. The only exception to this rule is the keyword "user" on the solve line, it followed by an arbitrary number of parameters must be the last one on the solve line.


For the example above, we have therefore the following information: the material density, thermal conductivity, viscosity, and heat coefficient are set to 1. The mesh is a uniform 64 by 64 by 64 structured grid, for the unit cube. The number of processors requested is 512 in the form of an 8 by 8 by 8 processor partition of subdomains. The boundary conditions on the unit cube indicate that the bottom faces have a variable temperature (group 6), specified in the routine "user.c" as the set of conditions "iset" = 2, of the section "dparabolic". The bottom and top surfaces (face=0,5) satisfy the noslip condition (provided by the group 0). The lateral faces (face=1,2,3,4) have a variable "x" component of the velocity, specified in the routine user.c, by the set "iset=1" in the section "dparabolic", and have "y" and "z" velocity components set to zero.
Based on this data input, the program solves the Navier Stokes and thermal equations in a coupled manner, with the specific boundary conditions provided by the section "dparabolic". For the sample data set here, no time dependency is implemented, and only one nonlinear cycle is computed. The precision of the bi-conjugate gradient solver is set to 1.e-10, and a maximum of 3000 iterations is specified. Finally, the output file will print the overall Mflop rate for the code.
We remark that upon completion of a flow simulation, the software provides an output file, and, if chosen so, data files to allow visualization of the mesh and the solution.

3.4 Sample Output File


The output file repeats a copy of the input file, and provides information corresponding to the integer value assigned to the "iprint" keyword in the input files. If the statement "iprint=0" is given, or if the keyword "iprint" is omitted, then no information is printed.
On the "material" input line, "iprint=1" reprints the material properties.
On the "mesh" input line, "iprint=1" prints the mesh information (coordinates, connectivity table) in the output file.
On the "solve" input line, "iprint=1" gives the number of iterations used in the iterative solver to converge to the tolerance "eps"; "iprint=2" prints the residual at each solver iteration.
On the "flop" input line, "iprint=1" prints the overall Gigaflop rate, and wallclock time spent in the execution of the program; "iprint=2" gives also the Gigaflop rates and wallclock times of the solver, the formation of the element stiffness matrices, and boundary conditions routine; finally, setting "iprint=3" shows the Gigaflop rate of the thermal equation solve, and of the Navier Stokes equations solve. An example of the output data for the input instruction flop, iprint=3 is given in section 3.6.

3.5 User specified conditions.


The file "user.c" allows the user to specify a special set of initial conditions, variable boundary conditions, or an exact solution, in various routines. The conditions are called by a string, specified in the input file. The following shows how the boundary conditions with a variable x component of the velocity, corresponding to the string "dparabolic" are coded in the routine "varibc" of the file "user.c":

 void varibc(DATAITEM **user, REAL curtime, int iset, REAL x, REAL y, REAL z,
	    REAL *value)
...
  else if (STRCASECMP(user[0]->cdata, ==, "dparabolic"))
    {
      *value = z*(1.-z);
    }
...

3.6 Usage


Steps for compiling MGFLO:
The tar file mgflo.tar contains all necessary files to compile and run the program. Once the tar file has been expanded, the steps to compile it are:
a) cd Csupes
b) /bin/make
c) cd ..
d) /bin/make
The warning messages can safely be ignored.
Steps To Run MGFLO:
The tar file includes an input file "ns.dat". The qsub script file mgflo.qsub can be submitted after changing line 3 to give the appropriate directory that the tar file is in. The output will be written to the file "ns.out".
The last 15 lines of the output file report the performance of the code. Below is a copy of the end of the output file.


Sample of output for a run on the 512 nodes T3E.
_____________________________________________________________________________

Time in solve = 858.947 sec
Overall GigaFlop  Rate:  51.279, GFLOP Cnt:  4.406e+04,      Time:        859
1 Proc Solve  MFLOPS/s: 103.688,  solveCnt:  8.579e+10, solveTime:        827
1 Proc Fkf    MFLOPS/s:  81.646,    fkfCnt:  2.612e+08,   fkfTime:        3.2
1 Proc MatVec MFLOPS/s: 108.020,     mvCnt:  8.410e+10,    mvTime:        779
1 Proc Precon MFLOPS/s:  18.095,     pcCnt:  1.474e+08,    pcTime:       8.15
1 Proc Dot    MFLOPS/s:  41.345,    dotCnt:  6.635e+08,   dotTime:         16
1 Proc AXPY   MFLOPS/s:  34.848,   axpyCnt:  6.633e+08,  axpyTime:         19
1 Proc AXPBY  MFLOPS/s:  43.245,  axpbyCnt:  2.211e+08, axpbyTime:       5.11

Time in flop = 0.000596881 sec

Time in stop = 6.64592E-06 sec
Time in PROGRAM: 859.237233
_____________________________________________________________________________

The "Overall GigaFlop Rate" line will report the total performance of the code summed across all processors and then reports that rate, in this case 51.279 GFLOPS/s.

The first test is stationary solution of the Navier Stokes equations and coupled solution of the thermal equation. The flow solution corresponds to a simple Poiseuille flow calculation, and the temperature solution is simple linear heat conduction. The exact solutions lie in the approximation subspace and are obtained by MGFLO using the coupled algorthm. The input file is provided above in section 3.3. Essentially linear speedup is obtained for configurations up to 512 nodes tested on the T3E, with 96.7% scaling (see Figure 1).

Figure1: Speedup scaling.

The second test case is a stationary Rayleigh-Benard-Marangoni type of flow including the surface tension and buoyancy contribution. The bottom temperature is set to 62.64 C, the reference temperature is 60C. The free surface is flat. The thickness of the layer is 0.45 mm, the surface section of the fluid is 1 cm by 1 cm. The velocity at the walls is set to 0. The side walls are insulated. The temperature field is specified on the free surface as shown on Figure 2. The code solves the Navier Stokes equations and the heat transfer equation using the coupled algorithm. The velocity vectors at the free flat surface are shown in Figure 3. The surface velocity vectors indicate that vortices are established due to the variations of the surface temperature fields, and surface tension.

Figure 2: Temperature variations on free surface. Figure 3: RBM Velocity vectors on free surface.

The third test problem is transient natural convection (NC) in a cube with heated side walls Two cases are considered:  a) free surface with no thermocapillary surface tension, and  b) with thermocapillary surface tension.  The sidewall temperatures are normalized to 1.0 in front and zero on the back surface .  The effect of suface tension is to augment the convective cell formation.  The nature of the flow and thermal fields is indicated in Figure 4.

Figure 4:  NC with surface tension.

5. Experimental Studies
Laboratory experiments are being conducted to investigate several aspects of surface-tension-driven convection. We have identified the primary instability in high Prandtl number liquid bridge convection and have used control techniques to prevent onset of instability. Liquid bridge convection is being studied now in low Prandtl number liquids, where the industrial interest is greatest, and onset to instability has been observed. Pattern formation within the ``dry spots'' of long-wavelength Benard convection is being examined for the appearance of hexagonal symmetries. We are also investigating liquid-liquid convection experiments, where a rich variety of instabilities--long- and short-wavelength, stationary and oscillatory--are predicted for both heating from below and heating from above.

6. Conclusions and Recommendations
This research project was initiated in Summer 1996 under the NASA ESS CAN program and software development began in Fall 1996. The initial prototype parallel code was based on a decoupled formulation and completed in July 1997.  A new version that includes both decoupled and coupled algorithms is now available for applications and testing on microgravity flow problems. The modular structure of the code permits expansion of its capabilities and several enhancements are planned as the research program continues. At this stage, we have focused on code optimization and performance milestones.  The thermocapillary effect has been inluded in both formulations.  Now that the prototype code is complete, we plan to work more closely with the flow experiment component of the program and focus on free surface physics. The basic approach of parallelization via domain decomposition, overlap of communications and computations, use of high performance BLAS1 and BLAS2 kernels for the EBE computations will be carried forward as the code capabilities are enhanced. Although the benchmark tests described to date have focussed on a rectangular solid geometry, and 3D cartesian grid, the code is not limited in this manner. It is designed to permit easy extension to more general geometries and meshes, and incorporation of more general partitioning strategies such as those obtained using METIS and similar programs.

Acknowledgments This project was funded by NASA ESS CAN NCCS5 154.

Glossary
 
BCG: Biconjugate gradient method, one of a class of generalised gradient iterative solvers suited for nonsymmetric algebraic systems.

BLAS: Basic Linear Algebra Subroutine, assembly coded to optimized performance.

EBE: Element-by-element formulation that circumvent system assembly.

finite element method: approximation technique involving weighted integral formulation of equations on approximation subspaces defined using discretization of domain to "elements" such as hexahedra or tetrahedra.
 
MGFLO: Microgravity Flow Software Package.

R-B-M: Rayleigh- Benard- Marangoni; thermally driven flows due to both buoyancy and thermocapillary effects.

thermocapillary: surface tension effects associated with temperature variations on the fluid free surface.
 

References