Friday morning October 31, nanoHUB tools and home directories will be unavailable from 6 AM to noon (eastern time); we're getting a new file server! All tool sessions will be lost. Also, the web site will be unavailable for about 15 minutes sometime between 8-9 AM. close


Support Options

Submit a Support Ticket


Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 5: CUDA Memories

By Wen-Mei W Hwu

University of Illinois at Urbana-Champaign

Published on


CUDA Memories


  • G80 Implementation of CUDA Memories
  • CUDA Variable Type Qualifiers
  • Where to Declare Variables
  • Variable Type Restrictions
  • A Common Programming Strategy
  • GPU Atomic Integer Operations
  • Matrix Multiplication Using Shared Memory
  • How About performance on G80?
  • IDEA: Use Shared Memory to reuse Global Memory Data
  • Tiled Multiply
  • CUDA Code - Kernel Execution Configuration


These lecture were breezed by Carl Pearson and Daniel Borup and then reviewed, edited ,and Uploaded by Omar Sobh.

Sponsored by


Cite this work

Researchers should cite this work as follows:

  • Wen-Mei W Hwu (2009), "Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 5: CUDA Memories,"

    BibTex | EndNote

Tags, a resource for nanoscience and nanotechnology, is supported by the National Science Foundation and other funding agencies. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.