This work is a part of the global tendency to use modern computing systems for modeling the phase-field phenomena. The main goal of this article is to improve the performance of a parallel application for the solidification modeling, assuming the dynamic intensity of computations in successive time steps when calculations are performed using a carefully selected group of nodes in the grid. A two-step method is proposed to optimize the application for multi-/manycore architectures. In the first step, the loop fusion is used to execute all kernels in a single nested loop and reduce the number of conditional operators. These modifications are vital to implementing the second step, which includes an algorithm for the dynamic workload prediction and load balancing across cores of a computing platform. Two versions of the algorithm are proposed—with the 1D and 2D maps used for predicting the computational domain within the grid. The proposed optimizations allow increasing the application performance significantly for all tested configurations of computing resources. The highest performance gain is achieved for two Intel Xeon Platinum 8180 CPUs, where the new code based on the 2D map yields the speedup of up to 2.74 times, while the usage of the proposed method with the 2D map for a single KNL accelerator permits reducing the execution time up to 1.91 times.
Dynamic workload prediction and distribution in numerical modeling of solidification on multi-/manycore architectures / Halbiniak, K.; Olas, T.; Szustak, L.; Kulawik, A.; Lapegna, M.. - In: CONCURRENCY AND COMPUTATION. - ISSN 1532-0626. - 33:11(2021), pp. 1-16. [10.1002/cpe.5905]
Dynamic workload prediction and distribution in numerical modeling of solidification on multi-/manycore architectures
Lapegna M.
2021
Abstract
This work is a part of the global tendency to use modern computing systems for modeling the phase-field phenomena. The main goal of this article is to improve the performance of a parallel application for the solidification modeling, assuming the dynamic intensity of computations in successive time steps when calculations are performed using a carefully selected group of nodes in the grid. A two-step method is proposed to optimize the application for multi-/manycore architectures. In the first step, the loop fusion is used to execute all kernels in a single nested loop and reduce the number of conditional operators. These modifications are vital to implementing the second step, which includes an algorithm for the dynamic workload prediction and load balancing across cores of a computing platform. Two versions of the algorithm are proposed—with the 1D and 2D maps used for predicting the computational domain within the grid. The proposed optimizations allow increasing the application performance significantly for all tested configurations of computing resources. The highest performance gain is achieved for two Intel Xeon Platinum 8180 CPUs, where the new code based on the 2D map yields the speedup of up to 2.74 times, while the usage of the proposed method with the 2D map for a single KNL accelerator permits reducing the execution time up to 1.91 times.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.