![]() ![]() ![]() ![]() the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2× to 4×, and performance very close to a reference and optimized CUDA code. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11× thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. ![]() In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. GPU capabilities have been increasing significantly in terms of performance and memory capacity. The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. Moreover, we collected the participants’ opinions about their experience in this study to understand deeply the results achieved. This study revealed important insights such as recurrent compile-time and programming logic errors performed by beginners in parallel programming, as well as the programming effort, challenges, and learning curve. Our contribution also comprises a parallel programming assessment methodology, which can be replicated in future assessments. The study covered three parallel APIs, reporting several quantitative and qualitative indicators involving developers. To this end, we conducted an empirical study with beginners in parallel application development. In this work, we aim to present a parallel programming assessment regarding the usability of parallel API for expressing parallelism on the stream processing application domain and multi-core systems. However, a few studies are seeking to prove the usability of these interfaces. In the last years, research efforts in academia and industry provided a set of parallel APIs, increasing productivity to software developers. Since building such applications is not a trivial task, the software industry must adopt parallel APIs (Application Programming Interfaces) that simplify the exploitation of parallelism in hardware for accelerating time-to-market. As soon as data, tasks, or requests arrive, they must be computed, analyzed, or processed. Multi-core systems are any computing device nowadays and stream processing applications are becoming recurrent workloads, demanding parallelism to achieve the desired quality of service. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |