Modern GPU Programming with CUDA and Thrust

By Gilles Civario

Irish Centre for High-End Computing

Published on

Abstract

Today's GPUs are massively parallel devices which provide programmers with TeraFlops supercomputing performance. But programming these devices and exploiting their fantastic potential is not always easy and might discourage application developers. CUDA for example is too often seen as a very low level and complicated language, although its performance it widely recognised. In this lecture, we will present a more modern and higher level approach of GPU computing with CUDA, using the Thrust library. A quick trajectory from a first "hello world" program to usable real-world teraflopic computation will be provided, proving that exploiting all the potential of modern GPUs is far less complicated that it seems.

Bio

After completing two Master degrees in Scientific Computing and Mathematics, Gilles joined in February 1998 the R&D team of Électricité de France to develop and maintain nuclear power plant simulation codes. Then he became in 2001 a support scientist at CEA/CCRT, one of the largest HPC centres in Europe, where he was involved in installing, developing, debugging and optimising codes across many scientific fields. Gilles then joined Bull's HPC benchmarking team in 2004, where he contributed to the design and deployment of HPC systems of all scale, including some of the most powerful of the Top500 machines. He joined ICHEC in June 2008 where his first role was to manage all activities in support of users on ICHEC's IBM BlueGene/P machine. In May 2009, Gilles was appointed as Head of the newly created Capability Computing and Novel Architectures group, with an extended remit including the management of ICHEC's rapidly growing GPU Computing activities, and a technology watch report, monitoring any cutting edge hardware and software on the HPC market. As such, Gilles is also Principal Investigator of the NVIDIA CUDA Research Center, title awarded in June 2010.

In his present role, Gilles is particularly involved all aspects related to novel architectures and their programming languages, such as CUDA, OpenCL, HMPP and OpenACC.

Cite this work

Researchers should cite this work as follows:

  • Gilles Civario (2013), "Modern GPU Programming with CUDA and Thrust," https://nanohub.org/resources/19373.

    BibTex | EndNote

Time

Location

New York University, New York, NY

Submitter

NanoBio Node

University of Illinois at Urbana-Champaign

Tags

Modern GPU Programming with CUDA and Thrust
by: Gilles Civario
  • Modern GPU Programming With CUDA and Thrust 1. Modern GPU Programming With CU… 0
    00:00/00:00
  • Plan 2. Plan 16.114441280783257
    00:00/00:00
  • What is GPU computing? 3. What is GPU computing? 19.338778568752343
    00:00/00:00
  • Why GPU computing? 4. Why GPU computing? 375.2466841513675
    00:00/00:00
  • Is that a silver bullet? 5. Is that a silver bullet? 492.76695391532411
    00:00/00:00
  • What does the HW look like? 6. What does the HW look like? 628.634270013738
    00:00/00:00
  • But what's inside? 7. But what's inside? 751.11320094916948
    00:00/00:00
  • Could we make it simpler? 8. Could we make it simpler? 926.4018733608093
    00:00/00:00
  • But what's the programming model? 9. But what's the programming mod… 991.23685053788063
    00:00/00:00
  • Ok, how do I code for that? 10. Ok, how do I code for that? 1244.3282673380636
    00:00/00:00
  • A CUDA 11. A CUDA "Hello world!" 1500.9032883192187
    00:00/00:00
  • A CUDA 12. A CUDA "Hello world!" 1577.391119249256
    00:00/00:00
  • A CUDA 13. A CUDA "Hello world!" 1596.4453019631349
    00:00/00:00
  • A CUDA 14. A CUDA "Hello world!" 1601.1516922692645
    00:00/00:00
  • A CUDA 15. A CUDA "Hello world!" 1615.0388880563464
    00:00/00:00
  • A CUDA 16. A CUDA "Hello world!" 1644.1680779318099
    00:00/00:00
  • A CUDA 17. A CUDA "Hello world!" 1664.2506556762833
    00:00/00:00
  • Let's step back... 18. Let's step back... 1699.4571499937556
    00:00/00:00
  • Let's step back... 19. Let's step back... 1718.3000624453605
    00:00/00:00
  • Let's step back... 20. Let's step back... 1739.0024728362685
    00:00/00:00
  • Let's step back... 21. Let's step back... 1823.79557886849
    00:00/00:00
  • Could we do the same for CUDA? 22. Could we do the same for CUDA? 2247.5131759710252
    00:00/00:00
  • Let's revisit our example 23. Let's revisit our example 2352.884725864868
    00:00/00:00
  • Let's revisit our example 24. Let's revisit our example 2505.2395903584365
    00:00/00:00
  • What is thrust good at? 25. What is thrust good at? 3446.3934807043838
    00:00/00:00
  • But is sufficient? 26. But is sufficient? 3649.9465217934307
    00:00/00:00
  • Philosophy 27. Philosophy 3882.8796303234672
    00:00/00:00
  • Example: saxpy 28. Example: saxpy 3998.4164356188335
    00:00/00:00
  • Example: saxpy 29. Example: saxpy 4238.0904460352422
    00:00/00:00
  • Performance Remarks 30. Performance Remarks 4379.2213930348262
    00:00/00:00
  • Performance comparison 31. Performance comparison 4495.9761235955057
    00:00/00:00
  • Thrust main features 32. Thrust main features 4627.24725603672
    00:00/00:00
  • Let's get our hands dirty 33. Let's get our hands dirty 4908.3561908441179
    00:00/00:00
  • What you've got to do 34. What you've got to do 5065.6379624222855
    00:00/00:00
  • But that's not all 35. But that's not all 5231.6514247846253
    00:00/00:00
  • What more? 36. What more? 5403.7624053826748
    00:00/00:00
  • And in real life? 37. And in real life? 5558.7735655737706
    00:00/00:00
  • So... 38. So... 5713.39087857848
    00:00/00:00