Illinois ECE 498AL: Programming Massively Parallel Processors

By Wen-Mei W Hwu

University of Illinois at Urbana-Champaign

Lecture Number/Topic Online Lecture Video Lecture Notes Supplemental Material Suggested Exercises
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 1: Introduction View Flash View Notes (pdf)
Programming Massively Parallel Processors Topics: Introduction, Grading, Outline Lab Equipment UIUC/NCSA QP Cluster UIUC/NCSA AP Cluster ECE498AL Development History Why Program...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 2: The CUDA Programming Model View Flash View Notes (pdf)
CUDA Programming Model Topics: What is GPGPU? CUDA An Example of Physical Reality Behind CUDA Parallel computing on a GPU CUDA - C With no shader limitations CUDA Devices and...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 3: CUDA Threads, Tools, Simple Examples View Flash View Notes (pdf)
CUDA Threads, Tools, Simple Examples Topics: A Running example of Matrix Multiplication Memory Layout of a Matrix in C Compiling a CUDA Program Device Emulation Mode Pitfalls Floating...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 4: CUDA Threads - Part 2 View Flash View Notes (pdf)
CUDA Threads Part2 Topics: CUDA Thread Block Transparent Scalability G80 CUDA Mode, A Review Executing Thread Blocks Thread Scheduling Block Granularity Considerations More Details...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 5: CUDA Memories View Flash Notes (pdf) Lecture5-CUDA-Memories.mp3
CUDA Memories Topics: G80 Implementation of CUDA Memories CUDA Variable Type Qualifiers Where to Declare Variables Variable Type Restrictions A Common Programming Strategy GPU Atomic...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 6: CUDA Memories - Part 2 View Flash Notes (pdf) Lecture6-CUDA-Memories-Part2.mp3
CUDA Memories Part2 Topics: Tiled Multiply Breaking Md and Nd into Tiles Tiled Matrix Multiplication Kernel CUDA Code - Kernel Execution Configuration First Order Size considerations...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 7: GPU as part of the PC Architecture View Flash Notes (pdf) Lecture7-GPU-in-PC
GPU as part of the PC Architecture Topics: Typical Structure of a CUDA Program Bandwidth: Gravity of Modern computer Systems (Original) PCI Bus Specification PCI as Memory Mapped I/O ...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 8: Threading Hardware in G80 View Flash Notes (pdf) Lecture8-Threading Hardware in G80
Threading Hardware in G80 Topics: Single Program Multiple Data (SPMD) Grids and Blocks CUDA Thread Block : Review Geforce-8 Series Hardware Overview CUDA Processor Terminology Stream...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 9: Memory Hardware in G80 View Flash Notes (pdf) Lecture9-Memory Hardware in G80
Memory Hardware in G80 Topics: CUDA Device Memory Space Parallel Memory Sharing SM Memory Architecture SM Register File Programmer view of Register File Matrix Multiplication...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 10: Control Flow View Flash View Notes (pdf)
Control Flow Topics: Terminology Review How Thread Blocks are Partitioned Control Flow Instructions Parallel Reduction A Vector Reduction Example A simple Implementation Vector...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 11: Floating Point Considerations View Flash View Notes (pdf)
Floating Point Considerations Topics: GPU Floating Point Features Normalized Representation Exponent Representation Representable Numbers Flush to Zero Denormaliztion Runtime Math...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 12: Structuring Parallel Algorithms View Flash View Notes (pdf)
Structuring Parallel Algorithms Topics: Key Parallel Programming Steps Algorithms Choosing Algorithm Structure Mapping a Divide and Conquer algorithm Tiled Algorithms Increased work...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 13: Reductions and their Implementation View Flash View Notes (pdf)
Structuring Parallel Algorithms Topics: Parallel Reductions Parallel Prefix Sum Relevance of Scan Application of Scan Scan on the CPU First attempt Parallel Scan Algorithm Work...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 14: Application Case Study - Quantative MRI Reconstruction View Flash
Quantative MRI Reconstruction Topics: Reconstructing MR Images An exciting revolution: Sodium Map of the Brain Least Squares reconstruction Q vs. FhD Algorithms to Accelerate From...

Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 15: Kernel and Algorithm Patterns for CUDA View Flash
Kernel and Algorithm Patterns for CUDA Topics: Reductions and Memory Patterns Reduction Patterns in CUDA Mapping Data into CUDA's Memories Input/Output Convolution Generic Algorithm...