Scalable Parallel Finite Element Computations of
Rayleigh-Benard-Marangoni Problems
in a Microgravity Environment

Project Director: Graham F. Carey

e-mail: carey@cfdlab.ae.utexas.edu

NASA ESS Cooperative Agreement Number: NCCS5-154

SCIENTIFIC AND TECHNOLOGY ACCOMPLISHMENTS

We have developed and analysed algorithms for the simulation of coupled heat transfer and 3D incompressible Navier Stokes flows. We implemented high performance gradient type iterative methods and domain decomposition to enable the parallel implementation. These methods have recently been successfully used for 3D Rayleigh-Benard type computations, and showed linear processor scaling on the T3D to achieve very high performance (approximately 16 Gflops on the 512 processor node system at NASA Goddard).

Sample results for the temperature of the fluid are shown on the Figure below for buoyancy driven flow with a 10 degree temperature difference between the bottom and top surfaces of a 3D box. The box is tilted at an angle of 10 degrees. Scaled speedup results for a representative performance study are shown in the next Figure. Points on the curve correspond to performance on 1, 8, 64, 512 nodes of the T3D.


Temperature at t = 5 sec, in the plane z=0.5, for Ra = 2500, Tcold = 10, Thot = 20. Box tilted 10 degrees with the vertical axis. In this problem, the hot fluid moves up the left wall, and the cold fluid moves down the right wall.

Speedup curve for Rayleigh Benard type flows: Poiseuille flow coupled with heat transfer (peak performance at 16.5 Gflops on 512 processor nodes for the T3D). The computation were run on 1, 8, 64 and 512 processors, with fixed problem size per processor.

The partition assigned to one processor for a structured 3D mesh is a brick, or its possible degenerations: a slice, or a column. For a brick partition, the processor communicates with at worst 26 processors (2 processors for a slice partition, and 8 processors for a column partition). Each brick is composed of elements at its surface, and elements at its interior. For the EBE scheme used in this algorithm, the processor communicates the information of the elements at the surface of the brick while it is computing the interior elements. This is done to hide the time cost of communications. For a slice, the communications are not overlapped with the computations of the interior elements, which deteriorates the overall performacnes of the code. No astute algorithm was implemented here to partially hide the communications of the already computed surface elements while the remaining surface elements were computed.

The following speedup curve shows performacnes for a fixed size problem: there are N elements, so that if there are p processors, each processor solves for N/p elements.In this particular case, for p = 512, the partition degenerates bricks to silces (2x4x1 partition). This is the worst case for the performances of the code as explained briefly above. Though, the code still passes the 10 Gflops milsetone level.


Related Experimental work.

In related experimental work, laboratory studies of surface-tension-driven convection phenomena have investigated high Marangoni number flow visualization for surface-tension driven convection using thin liquid layers heated from below.

A four slides version of this html page is available:

A postscript version of this html page is available: Compressed version for Sun4.

POINT OF CONTACT

Back to CFDLAB home page
carey@cfdlab.ae.utexas.edu