Illinois ECE 498AL: Programming Massively Parallel Processors

By Wen-Mei W Hwu

University of Illinois at Urbana-Champaign

Category

Courses

Published on

Abstract

Spring 2009

image

Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number of applications that traditionally used Application Specific Integrated Circuits (ASICs) are now implemented with concurrent processors in order to improve functionality and reduce engineering cost. The real challenge is to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals.

The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Today NVIDIA processors already exhibit this capability. Processors from Intel, AMD, and IBM will begin to qualify as massively parallel in the next several years. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

We will be using NVIDIA processors and the CUDA programming tools in the lab section of the course. Many have reported success in performing non-graphics parallel computation as well as traditional graphics rendering computation on these processors. You will go through structured programming assignments before being turned loose on the final project. Each programming assignment will involve successively more sophisticated programming skills. The final project will be of your own design, with the requirement that the project must involve a demanding application such as mathematics- or physics-intensive simulation or other data-intensive computation, followed by some form of visualization and display of results.

This is a course in programming massively parallel processors for general computation. We are fortunate to have the support and presence of David Kirk, the Chief Scientist of NVIDIA and one of the main driving forces behind the new NVIDIA CUDA technology. Building on architecture knowledge from ECE 411, and general C programming knowledge, we will expose you to the tools and techniques you will need to attack a real-world application for the final project. The final projects will be supported by some real application groups at UIUC and around the country, such as biomedical imaging and physical simulation.

Course Website

Programming Massively Parallel Processors

Topics:

  • Introduction
  • GPU Computing and CUDA Programming Model Intro
  • CUDA Example and CUDA Threads
  • CUDA Threads Part 2 and API Details
  • CUDA Memory
  • CUDA Memory Example
  • GPU as Part of the PC Architecture
  • CUDA Threading Hardware
  • CUDA Memory Hardware
  • Control Flow in CUDA
  • Floating Point Performance, precision and Accuracy
  • Parallel Programming Basics
  • Parallel Algorithm Basics

 

 

Credits

These lecture were breezed by Carl Pearson and Daniel Borup and then reviewed, edited ,and Uploaded by Omar Sobh.

References

Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill, Patterns for Parallel Programming, Addison Wesley

Cite this work

Researchers should cite this work as follows:

  • Wen-Mei W Hwu (2009), "Illinois ECE 498AL: Programming Massively Parallel Processors," https://nanohub.org/resources/7225.

    BibTex | EndNote

Tags

Lecture Number/Topic Online Lecture Video Lecture Notes Supplemental Material Suggested Exercises
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 1: Introduction View Flash View Notes (pdf)
Programming Massively Parallel Processors Topics: Introduction, Grading, Outline Lab Equipment UIUC/NCSA QP Cluster UIUC/NCSA AP Cluster ECE498AL Development History Why Program...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 2: The CUDA Programming Model View Flash View Notes (pdf)
CUDA Programming Model Topics: What is GPGPU? CUDA An Example of Physical Reality Behind CUDA Parallel computing on a GPU CUDA - C With no shader limitations CUDA Devices and...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 3: CUDA Threads, Tools, Simple Examples View Flash View Notes (pdf)
CUDA Threads, Tools, Simple Examples Topics: A Running example of Matrix Multiplication Memory Layout of a Matrix in C Compiling a CUDA Program Device Emulation Mode Pitfalls Floating...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 4: CUDA Threads - Part 2 View Flash View Notes (pdf)
CUDA Threads Part2 Topics: CUDA Thread Block Transparent Scalability G80 CUDA Mode, A Review Executing Thread Blocks Thread Scheduling Block Granularity Considerations More Details...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 5: CUDA Memories View Flash Notes (pdf) Lecture5-CUDA-Memories.mp3
CUDA Memories Topics: G80 Implementation of CUDA Memories CUDA Variable Type Qualifiers Where to Declare Variables Variable Type Restrictions A Common Programming Strategy GPU Atomic...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 6: CUDA Memories - Part 2 View Flash Notes (pdf) Lecture6-CUDA-Memories-Part2.mp3
CUDA Memories Part2 Topics: Tiled Multiply Breaking Md and Nd into Tiles Tiled Matrix Multiplication Kernel CUDA Code - Kernel Execution Configuration First Order Size considerations...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 7: GPU as part of the PC Architecture View Flash Notes (pdf) Lecture7-GPU-in-PC
GPU as part of the PC Architecture Topics: Typical Structure of a CUDA Program Bandwidth: Gravity of Modern computer Systems (Original) PCI Bus Specification PCI as Memory Mapped I/O ...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 8: Threading Hardware in G80 View Flash Notes (pdf) Lecture8-Threading Hardware in G80
Threading Hardware in G80 Topics: Single Program Multiple Data (SPMD) Grids and Blocks CUDA Thread Block : Review Geforce-8 Series Hardware Overview CUDA Processor Terminology Stream...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 9: Memory Hardware in G80 View Flash Notes (pdf) Lecture9-Memory Hardware in G80
Memory Hardware in G80 Topics: CUDA Device Memory Space Parallel Memory Sharing SM Memory Architecture SM Register File Programmer view of Register File Matrix Multiplication...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 10: Control Flow View Flash View Notes (pdf)
Control Flow Topics: Terminology Review How Thread Blocks are Partitioned Control Flow Instructions Parallel Reduction A Vector Reduction Example A simple Implementation Vector...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 11: Floating Point Considerations View Flash View Notes (pdf)
Floating Point Considerations Topics: GPU Floating Point Features Normalized Representation Exponent Representation Representable Numbers Flush to Zero Denormaliztion Runtime Math...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 12: Structuring Parallel Algorithms View Flash View Notes (pdf)
Structuring Parallel Algorithms Topics: Key Parallel Programming Steps Algorithms Choosing Algorithm Structure Mapping a Divide and Conquer algorithm Tiled Algorithms Increased work...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 13: Reductions and their Implementation View Flash View Notes (pdf)
Structuring Parallel Algorithms Topics: Parallel Reductions Parallel Prefix Sum Relevance of Scan Application of Scan Scan on the CPU First attempt Parallel Scan Algorithm Work...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 14: Application Case Study - Quantative MRI Reconstruction View Flash
Quantative MRI Reconstruction Topics: Reconstructing MR Images An exciting revolution: Sodium Map of the Brain Least Squares reconstruction Q vs. FhD Algorithms to Accelerate From...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 15: Kernel and Algorithm Patterns for CUDA View Flash
Kernel and Algorithm Patterns for CUDA Topics: Reductions and Memory Patterns Reduction Patterns in CUDA Mapping Data into CUDA's Memories Input/Output Convolution Generic Algorithm...