Fine-Grain Parallel Computing: the Next Frontier in High
The Boulder HPC Facility: Exploring New Computing Technologies for NOAA
Major Activites in 2012 / 2013
- Fortran to CUDA (or C) Compiler: The F2C-ACC compiler was released to the public in June 2009. While there are limitations development of this compiler has proven useful for the parallelization of the NIM and other weather models.
- OpenACC GPU Compiler Evaluation: We would like to use the commercial compilers once they are ready. In a recent evaluation of the Cray, PGI and CAPS compilers we compared their performance to our F2C-ACC compiler. Performance results show they are significantly slower than the F2C-ACC compiler, but further optimizations may be possible. To improve their performance, we plan to share code from the FIM, NIM and WRF models, packaged into simple stand alone tests, that were used in our evaluation.
The standalone tests, configured to run on ORNL Titan, can be downloaded, configured and run using the appropriate makefile, runscript and module scripts for each compiler. Code examples are clean, efficient and with the highly parallelizable loop structure found in most weather and climate codes.
- NIM Model Dynamics: Parallelization using F2C-ACC for the GPU, and openMP for the Intel MIC. Parallelization efforts focused on (1) maintaining a single source code for CPU, GPU and MIC, and (2) run efficiently on all architectures. Similar effort was required to parallelize the model using openMP for MIC (native mode), and F2C-ACC for GPU. Most of the effort required was adapting the Fortran code to expose fine-grain parallelism, while maintaining performance portability. We have run the NIM on over 1000 GPUs on ORNL's Titan machine, and hundreds of nodes on NSF's TACC MIC cluster. Further work is ongoing to optimize inter GPU / MIC performance on these systems to improve scalability.
- NIM Model Physics: Early work was done to extract a standalone test of the WRF PBL (Planetary Boundary Layer) physics that yielded encouraging results on the GPU in 2011. This has led to a more general approach to parallelization that has mostly focused on small changes to expose fine-grain parallelism and handle alignment issues (for the MIC). Parallelization of select routines is expected to begin in November 2013.
- Coming ...
Prepared by Mark Govett, Mark.W.Govett@noaa.gov
Date of last update:November 1, 2013