SUBCONTRACTS: None
PRINCIPAL INVESTIGATOR: Graham F. Carey, carey@cfdlab.ae.utexas.edu (512) 471-4207
TITLE OF EFFORT: Distributed Parallel Gradient Iterative Solution
OBJECTIVE: The present research work involves the formulation, development and implementation of a class of sparse iterative solvers of generalized conjugate gradient type for parallel distributed architectures. The algorithms and software are to be designed to maintain efficiency and be portable across different architectures. Benchmark and large scale PDE application studies will be made. Both structured and unstructured grids are of interest. The research is developed to meet US need for parallel scalable solver software. Supporting performance studies will be made on different architectures.
APPROACH: We are developing parallel iterative techniques and accelerators for very large sparse algebraic systems arising from finite difference, finite volume or finite element methods for solving systems of PDE's in engineering and science. A significant part of the work involves parallel software for distributed systems. The software is to be portable across different architectures and this goal is being achieved by means of extensive use of macros. Furthermore, schemes should be robust, efficient, and scalable to meet the design objectives.
Generalized gradient iterative methods offer the possibility of efficient parallel performance for large sparse systems. The basic algorithms can be decomposed to fundamental kernels and these can be coded to achieve high parallel performance. We have been involved in parallel algorithm research in this area for some time and there is significant interest in this topic. Our present approach deals with both structured and unstructured grids with the focus to date on the structured case using parallelization over processor subdomains. The unstructured case will be treated using the element-by-element parallel subdomain strategies we have pioneered (e.g. see Carey, 1989). Supporting theoretical and algorithm work concerns the concurrency inherent in the associated BLAS and the basic iterative kernels since this is important to library design and code transferability across architectures. Stencil data structures, element-by-element schemes, domain decomposition and parallel preconditioning using block and multigrid methods are important components of our approach. Issues concerning the software implementation, use of macros, efficiency, portability and maintainability are being addressed.
Questions of parallel resource allocation and scalability are also being studied. The research will provide new knowledge on the utility of generalized gradient algorithms in distributed parallel environments as well as a practical capability which will be demonstrated both for benchmark problems and for large-scale applications. Related research on parallel isoefficiency for these algorithms will provide insight into projected performance. Applications will include the two focal areas of Computational Fluid Dynamics and Semiconductor Transport Simulation.
PROGRESS: The research project was initiated in September 1992 and significant progress has been made. Some of the recent work was presented at the Breckenridge Conference on Iterative Methods and at the ARPA review meeting in Albuquerque. We will also be presenting results at the forthcoming SIAM meeting.
At the 'heart' of the generalized gradient iterative computational schemes comprising the iterative library are "iterative kernels" involving repeated matrix-vector products and dot-products. We are exploiting this idea to develop parallel algorithms that are transportable across architectures and exploit the same software infrastructure. Work during the past year has focused on several key issues: first, we have continued our work on the kernels and extended these to accommodate several new machines including the Intel Paragon, NCUBE, CM5 and C90. Recently we began work involving the T3D and SP2 which will be part of the effort this year. Secondly, we have significantly expanded the algorithm collection to include additional methods. Simultaneously, several students have been carrying out performance tests on 2D and 3D benchmark problems. We are also applying the software to larger scale applications for Navier-Stokes problems in CFD and carrier transport in semiconductor device modeling. Both applications areas were designated as prime "test beds" by ARPA when the program was established. A subdomain strategy to achieve high parallelism for sparse finite difference or finite element type systems and vector field applications has been developed.
The design philosophy for the parallel portable software has been influenced by the diversity of the machines and the current evolution of parallel computers. Portability is being achieved through extensive use of macros. In particular we have utilized the M4 macros because of their wide availability and have also developed CS macros to maintain code clarity and structure so that software can be easily maintained. We have installed the revision update system CVS to permit researchers to concurrently work on the software and this has been effective. A prototype version of the package with several gradient type algorithms has been run on several different architectures. Recently we have invested substantial effort in optimizing the kernels. In particular we have developed a cache mirror for an assembly-language routine to treat the basic matrix vector product. This has been tested on the Intel i860 and Paragon machines successfully. Papers describing recent results were presented at the Scalable High Performance Computing Conference in Knoxville, Tennessee in May 1994 and at the IMACS World Congress, July 1994 in Atlanta, Georgia. Papers have been accepted for presentation at Supercomputing '94 and the 7th SIAM Conf. on Parallel Processing for Scientific Computing
ACCOMPLISHMENTS:
(2) Carey, G. F. and R. McLay, Maximizing Sparse Matrix Vector Product Performances in MIMD Computers, Conf. on Iterative Methods, April 1-9, 1994, Breckenridge, CO.
(3) Lorber, A. and G. F. Carey, On the Relationship Between ODE Solvers and Iterative Solvers for Linear Equations, Conf. on Iterative Methods, April 1-9, 1994, Breckenridge, CO.
(4) Bova, S. and G. F. Carey, Iterative Methods for Stationary Convection, Conf. on Iterative Methods, April 1-9, 1994, Breckenridge, CO.
(5) Lorber, A. and G. F. Carey, Time-Iterative Recursion with Domain Decomposition for Viscous Flows, IMACS Conference, July 11-15, 1994, Atlanta.
(6) Joubert, W. and G. F. Carey, PCG: A Software Package for the Iterative Solution of Linear Systems on Scalar, Vector and Parallel Computers, IMACS Conf., July 11-15, 1994, Atlanta.
(7) Barragy, E. and G. F. Carey, Parallel Element-by-Element Performance for Navier-Stokes Computations, To be Presented, 7th SIAM Conf. on Parallel Processing for Scientific Computing, Feb. 15-17, 1995, San Francisco, CA.
(8) Carey, G. F., A. K. Stagg and D. D. Cline, Implementing a Parabolized Navier-Stokes Flow Solver on Massively Parallel, Scalable Computer Systems, 7th SIAM Conf. on Parallel Processing for Scientific Computing, Feb. 15-17, 1995, San Francisco, CA.
aal@cfdlab.ae.utexas.edu