Structuring Parallel Algorithms
Topics:
- Parallel Reductions
- Parallel Prefix Sum
- Relevance of Scan
- Application of Scan
- Scan on the CPU
- First attempt Parallel Scan Algorithm
- Work efficiency considerations
- Improving Efficiency
- Use Padding to reduce conflicts
- Global Synchronization in CUDA
These lecture were breezed by Carl Pearson and Daniel Borup and then reviewed, edited ,and Uploaded by Omar Sobh.
Researchers should cite this work as follows:
-
Wen-Mei W Hwu (2009), "Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 13: Reductions and their Implementation," https://nanohub.org/resources/7376.