High-performance computing: challenges of performance tuning and scaling of finite element models

In my PhD project, focused on computational modeling of biodegradation process of metallic biomaterials, parallelization of the models was one of the main objectives. Parallelization was crucial to make the models run faster to get the predictions and output in less time in large-scale simulations in high-performance computing (HPC) environments. Achieving this goal got me involved in various challenges all over the project, which can be divided into two main categories: implementation and performance tuning issues. The main implementation strategy was based on high-performance mesh decomposition, partitioning and distributing the mesh among available computing resources, and then utilization of high-performance preconditioners and iterative solvers tailored for different systems and physics. This was done mostly using parallel computing features of PETSc toolkit.

Although it doesn’t seem so, the performance tuning aspect can be as complicated as the implementation. Running the model using 10 CPU cores with an accepted performance and speedup does not mean that one can increase the number of cores to 100 and still get the same speedup. The same problem appears by moving from the order of hundreds to the order of thousands, and so on. Entering a new order of magnitude for the number of CPU cores requires dealing with new issues.

This post briefly summarizes various issues one can face while tackling HPC and performance-tuning challenges. These experiences are obtained by working in HPC environments on VSC supercomputer in Belgium, Snellius supercomputer in the Netherlands, and ARCHER2 supercomputer in the UK.