Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 1: Introduction |
View Flash |
View |
Notes (pdf)
|
|
|
Programming Massively Parallel Processors
Topics:
Introduction, Grading, Outline
Lab Equipment
UIUC/NCSA QP Cluster
UIUC/NCSA AP Cluster
ECE498AL Development History
Why Program...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 2: The CUDA Programming Model |
View Flash |
View |
Notes (pdf)
|
|
|
CUDA Programming Model
Topics:
What is GPGPU?
CUDA
An Example of Physical Reality Behind CUDA
Parallel computing on a GPU
CUDA - C With no shader limitations
CUDA Devices and...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 3: CUDA Threads, Tools, Simple Examples |
View Flash |
View |
Notes (pdf)
|
|
|
CUDA Threads, Tools, Simple Examples
Topics:
A Running example of Matrix Multiplication
Memory Layout of a Matrix in C
Compiling a CUDA Program
Device Emulation Mode Pitfalls
Floating...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 4: CUDA Threads - Part 2 |
View Flash |
View |
Notes (pdf)
|
|
|
CUDA Threads Part2
Topics:
CUDA Thread Block
Transparent Scalability
G80 CUDA Mode, A Review
Executing Thread Blocks
Thread Scheduling
Block Granularity Considerations
More Details...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 5: CUDA Memories |
View Flash |
|
Notes (pdf)
|
Lecture5-CUDA-Memories.mp3
|
|
CUDA Memories
Topics:
G80 Implementation of CUDA Memories
CUDA Variable Type Qualifiers
Where to Declare Variables
Variable Type Restrictions
A Common Programming Strategy
GPU Atomic...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 6: CUDA Memories - Part 2 |
View Flash |
|
Notes (pdf)
|
Lecture6-CUDA-Memories-Part2.mp3
|
|
CUDA Memories Part2
Topics:
Tiled Multiply
Breaking Md and Nd into Tiles
Tiled Matrix Multiplication Kernel
CUDA Code - Kernel Execution Configuration
First Order Size considerations...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 7: GPU as part of the PC Architecture |
View Flash |
|
Notes (pdf)
|
Lecture7-GPU-in-PC
|
|
GPU as part of the PC Architecture
Topics:
Typical Structure of a CUDA Program
Bandwidth: Gravity of Modern computer Systems
(Original) PCI Bus Specification
PCI as Memory Mapped I/O
...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 8: Threading Hardware in G80 |
View Flash |
|
Notes (pdf)
|
Lecture8-Threading Hardware in G80
|
|
Threading Hardware in G80
Topics:
Single Program Multiple Data (SPMD)
Grids and Blocks
CUDA Thread Block : Review
Geforce-8 Series Hardware Overview
CUDA Processor Terminology
Stream...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 9: Memory Hardware in G80 |
View Flash |
|
Notes (pdf)
|
Lecture9-Memory Hardware in G80
|
|
Memory Hardware in G80
Topics:
CUDA Device Memory Space
Parallel Memory Sharing
SM Memory Architecture
SM Register File
Programmer view of Register File
Matrix Multiplication...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 10: Control Flow |
View Flash |
View |
Notes (pdf)
|
|
|
Control Flow
Topics:
Terminology Review
How Thread Blocks are Partitioned
Control Flow Instructions
Parallel Reduction
A Vector Reduction Example
A simple Implementation
Vector...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 11: Floating Point Considerations |
View Flash |
View |
Notes (pdf)
|
|
|
Floating Point Considerations
Topics:
GPU Floating Point Features
Normalized Representation
Exponent Representation
Representable Numbers
Flush to Zero
Denormaliztion
Runtime Math...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 12: Structuring Parallel Algorithms |
View Flash |
View |
Notes (pdf)
|
|
|
Structuring Parallel Algorithms
Topics:
Key Parallel Programming Steps
Algorithms
Choosing Algorithm Structure
Mapping a Divide and Conquer algorithm
Tiled Algorithms
Increased work...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 13: Reductions and their Implementation |
View Flash |
View |
Notes (pdf)
|
|
|
Structuring Parallel Algorithms
Topics:
Parallel Reductions
Parallel Prefix Sum
Relevance of Scan
Application of Scan
Scan on the CPU
First attempt Parallel Scan Algorithm
Work...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 14: Application Case Study - Quantative MRI Reconstruction |
View Flash |
|
|
|
|
Quantative MRI Reconstruction
Topics:
Reconstructing MR Images
An exciting revolution: Sodium Map of the Brain
Least Squares reconstruction
Q vs. FhD
Algorithms to Accelerate
From...
|
Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 15: Kernel and Algorithm Patterns for CUDA |
View Flash |
|
|
|
|
Kernel and Algorithm Patterns for CUDA
Topics:
Reductions and Memory Patterns
Reduction Patterns in CUDA
Mapping Data into CUDA's Memories
Input/Output Convolution
Generic Algorithm...
|