Scalable Parallel Finite Element Computations of
Rayleigh-Benard-Marangoni Problems
in a Microgravity Environment

Project Director: Graham F. Carey

NASA ESS Cooperative Agreement Number: NCCS5-154

PERIOD COVERED BY THE REPORT: July 15, 1996 - August 15, 1996

OBJECTIVE

To develop and implement efficient parallel algorithms for solution of incompressible Navier Stokes and heat transfer including surface tension effects in a microgravity environment and to carry out supporting experimental studies of nonlinear instabilities and flow phenomena.

APPROACH

There are two main thrusts to the research work. The most significant is the development of a numerical solution scheme and supporting algorithms capable of sustained multi-gigaflop performance on the CRAY T3E and T3D parallel architectures. This component of the work involves development of new parallel algorithms and their implementation in an efficient, scalable manner. The application target is coupled viscous flow and heat transfer described by the incompressible Navier Stokes equations with the Boussinesq approximation. Of particular interest are the nonlinear interactions associated with the Marangoni effects in a microgravity environment where surface tension effects dominate buoyancy. Representative applications of interest to NASA include manufacturing processes in space and the performance of fluid mechanic systems in space vehicles as well as fluid flow and transport in life support systems.

The second component of the work involves fluid mechanics experiments in thin layers which are able to duplicate the high marangoni number effect of interest in space applications. The basic science contribution will stem from both the new experimental results and the ability to explore these flows in greater detail computationally.

SCIENTIFIC AND TECHNOLOGY ACCOMPLISHMENTS

During the period of this report, work was initiated by Dr. Carey on the development and analysis of algorithms for the simulation methodology. Several algorithms are being studied at the theoretical level to examine scalability and efficiency. At present we are focusing on high performance gradient type iterative methods and domain decomposition to enable the parallel implementation. A second class of algorithms involve a new variant of the stabilized time-stepping Runge-Kutta schemes that we have entitled CPRK methods. A technical report describing the methodology is being prepared. In the experimental work two basic problems are being studied. The first is the Rayleigh-Benard-Marangoni problem and the second is the liquid bridge problem. Results from these studies will be provided in the next report.

STATUS AND PLANS

Milestone 1: Establish project team or programmers, graduate students, post docs, research staff scientists and faculty. Completed update of agreement including negotiated milestones and software deliverables. Exploratory parallel algorithm studies and flow experiments. July 15, 1996.

The work for Milestone 1 was completed by July 15. The project team and programmers, and experimentalist was established, contract negotiations were discussed with the NASA site visit group. The revised proposal and budget, consistent with the negotiated milestones, milestone dates and milestone payments, was mailed to Jim Fischer on June 27. Several additional minor revisions to the milestones were submitted as requested by Jim Fischer. Dr. Carey met with Tom Formhals in Washington D. C. to discuss the contract and collaboration with the vendor. In addition to technical aspects of the work performance and scalability, we discussed access to the testbed site and to other Cray installations. Finally, we discussed the possibility of a training program to be convened by Cray to expedite the work. Subsequently, Cray has set up two workshops to implement these ideas. Dr. McLay and Mr. Bose will attend the August 6-8 workshop and Dr. Harle and Mr. Davis will attend the August 13-15 workshop.

Milestone 2: Cache mirror benchmark tests for computational kernels with scaling analysis to 10 Gigaflops for proposed GSFC testbed. Experimental and theoretical studies. Submitted FY '96 annual report to sponsor via WWW. August 15, 1996.

The research activity scheduled for Milestone 2 is on track despite the fact that negotiations have been delayed and access to the computer platform is not yet available. We have carried out theoretical studies that demonstrate that the kernels will perform at multi-gigaflop rates on the testbed T3D and T3E configurations. The experimental fluid studies are under way. Dr. Carey has arranged for Mr. Bose to spend the period July-August-Sept as an intern at Cray, Eagan, to work on numerical simulations related to the project. He will have direct access to the T3E and the T3D. Dr. McLay is developing a software performance tester to benchmark the computational kernels and a technical report on this work is being prepared. Benchmark tests on the DEC Alpha node have been made and Dr. McLay will make performance studies at the Cray Workshop. A technical report on the MATVEC kernels and Cache mirror concept is in preparation for submission to a journal. Spencer Swift is assisting in this work.

There are numerous detailed algorithm issues to be resolved. For example: Is it better to have a single vector of unknowns on a processor or store by elements? The advantage of storing them as a single vector means that the dot products are simpler and the fully summed value from the EBE matvec will not have to be found and then ``broadcasted'' to all the locations which share that same node location. The disadvantage of the single vector approach is that the values for the current element will have to be extracted from the vector to compute the matvec. Storing the unknowns in element format will obviously make the matvec easier but with the expense of the ``broadcast''. It will be better in the saxpy sense since there will be more flops in what should be an efficient operation. Dot products would be more difficult in that you'd have to use the MASKED dot product approach since almost all nodes will be shared. Since there is no MASKED dot product that is available with blas, it is not clear that this will be effective on the Alpha. Even if a masked dot product were available it is more likely to run afoul of the direct mapped cache on the alpha.

Dr. Doug Cline is coordinating the UT HPCF activities and arrangements for the new Cray T3E. The T3E is scheduled to arrive September 20 - serial # 7 of the Air Cool Chassis. It will be 40 compute nodes and 4 service nodes; 128 megabytes of memory per node; 100 gigabytes of disk and 1 gigaring I/O connector. The plan is to FDDI connect to Cray J90's with home directories resident on J90 system. We have recently acquired an STK 9710 robot for tape archival storage using digital linear tape media. It currently has a capacity of 15 terabytes off-line storage capacity with future plans to upgrade that archiving system. We plan to begin kernel tests on the T3E as soon as it is installed. This effort will be concurrent with our T3D activities.

Laboratory experiments on surface-tension-driven convection phenomena are now in progress to investigate high Marangoni number phenomena. We are examining spatial patterns in surface-tension-driven convection using thin liquid layers heated from below. The layers are sufficiently thin so that surface tension rather than buoyancy provides the dominant driving mechanism, even for large temperature gradients (large Marangoni number). A transition has been found from hexagonal patterns to square convection cells. At higher Marangoni number, the squares lose stability to a disordered state. In the disordered state the cells grow in size with increasing Marangoni number. We are investigating an analogy between the disordered state and disordered patterns in soap bubbles and magnetic bubbles.

As part of his studies at Cray this summer, Mr. Bose is working on a least-squares formulation of the incompressible Navier-Stokes equations that will be ported to MPP platforms such as Cray T3D and T3E. We will use both MPI and Cray's proprietary shared memory language extension F-- (currently under development) protocols to parallelize the code. A benchmarking study will then be initiated to test the performance and scalability of this code. Once the parallel communication and solver issues have been resolved, we will incorporate the energy equations into our model to simulate thermo-capillary flows on distributed memory multiprocessing machines. A more detailed plan describing the schedule is available on request.

Dr. Young is working on the development of rapidly converging methods for solving large systems of linear algebraic equations with nonsymmetric matrices such as arise in the flow and transport problems of interest to NASA. He and Mr. Chen, a PhD student in Mathematics, have developed and are now testing two new procedures. One of the procedures is LANGMRES which combines the Lanczos method with GMRES. The other procedure is LANSYMMQR which combines the Lanczos method with the SYMMQR procedure of Paige and Saunders. The object is to construct software which is both rapidly convergent and numerically stable.

Dr. Harle is working with Dr. Carey on parallel adaptive Navier Stokes and transport computations. The performances of test code components in the structured grid setting have been investigated for comparison purposes using the multi-processor shared memory Cray J-90, C-90, T-90. Sustained performances on a 16 processor C-90 achieved 2 Gflops (1 Gflops was achieved on 4 processors of a C-90, and 0.6 Gflops on one processor of a T-90) for an industry benchmark test problem. We plan next to test the new Chebychev iteration polynomial based Runge Kutta recursions method developed by Carey and Lorber. Results of this study will be delivered in a later report. Adaptive refinement criteria for local error estimation are also being studied by Dr. Carey in collaboration with Dr. Harle.

POINT OF CONTACT

Graham F. Carey
CFDLab
ASE/EM Department
The University of Texas at Austin
Austin, TX 78712
carey@cfdlab.ae.utexas.edu

MATVEC Performance for a Single Processor Dec Alpha Workstation at 225 MHz.

PUBLICATIONS AND PRESENTATIONS:

None

OTHER MEDIA:

None

PATENTS:

None

GRADUATE STUDENTS

FACULTY ASSOCIATE

POST DOCS

FACULTY

  • 03/25/97: Progress Report