Tags: GPU

Resources (1-20 of 20)

  1. Recursive algorithm for NEGF in Python GPU version

    02 Feb 2021 | | Contributor(s):: Ning Yang, Tong Wu, Jing Guo

    This folder contains two Python functions for GPU-accelerated simulation, which implements the recursive algorithm in the non-equilibrium Green’s function (NEGF) formalism. Compared to the matlab implementation [1], the GPU version allows massive parallel running over many cores on GPU...

  2. Atomic Resolution Brownian Dynamics

    09 Nov 2016 | | Contributor(s):: Chris Maffeo

    GPU-accelerated Brownian dynamics simulation tool for biomolecular and nanotechnological systems

  3. A Performance Comparison of Algebraic Multigrid Preconditioners on GPUs and MIC

    02 Feb 2016 | | Contributor(s):: Karl Rupp

    Algebraic multigrid (AMG) preconditioners for accelerators such as graphics processing units (GPUs) and Intel's many-integrated core (MIC) architecture typically require a careful, problem-dependent trade-off between efficient hardware use, robustness, and convergence rate in order to...

  4. 3D Topological Insulator Nanowire NEGF Simulation on GPU

    23 May 2015 | | Contributor(s):: Gaurav Gupta

    This code developed in C and CUDA simulates the carrier transport in three-dimensional (3D) topological insulator (TI) nanowire, with Bi2Se3 as exemplar material, with or without impurities, edge defects, acoustic phonons and vacancies for semi-infinite or metallic...

  5. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 14: Application Case Study - Quantative MRI Reconstruction

    21 Sep 2009 | | Contributor(s):: Wen-Mei W Hwu

    Quantative MRI ReconstructionTopics: Reconstructing MR Images An exciting revolution: Sodium Map of the Brain Least Squares reconstruction Q vs. FhD Algorithms to Accelerate From C to CUDA: What Unit of Work is Assigned to each Thread? Code Motion A Second option for the cmpFhD kernel Loop...

  6. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 15: Kernel and Algorithm Patterns for CUDA

    25 Sep 2009 | | Contributor(s):: Wen-Mei W Hwu

    Kernel and Algorithm Patterns for CUDATopics: Reductions and Memory Patterns Reduction Patterns in CUDA Mapping Data into CUDA's Memories Input/Output Convolution Generic Algorithm Description What could each thread be assigned? Thread Assignment Trade-offs What memory Space does the Data use?...

  7. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 13: Reductions and their Implementation

    15 Sep 2009 | | Contributor(s):: Wen-Mei W Hwu

    Structuring Parallel AlgorithmsTopics: Parallel Reductions Parallel Prefix Sum Relevance of Scan Application of Scan Scan on the CPU First attempt Parallel Scan Algorithm Work efficiency considerations Improving Efficiency Use Padding to reduce conflicts Global Synchronization in CUDAThese...

  8. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 12: Structuring Parallel Algorithms

    15 Sep 2009 | | Contributor(s):: Wen-Mei W Hwu

    Structuring Parallel AlgorithmsTopics: Key Parallel Programming Steps Algorithms Choosing Algorithm Structure Mapping a Divide and Conquer algorithm Tiled Algorithms Increased work per thread Double Buffering Loop Fusion and Memory Privatization Pipeline or "Spatial Computing Model" These lecture...

  9. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 11: Floating Point Considerations

    08 Sep 2009 | | Contributor(s):: Wen-Mei W Hwu

    Floating Point ConsiderationsTopics: GPU Floating Point Features Normalized Representation Exponent Representation Representable Numbers Flush to Zero Denormaliztion Runtime Math library Make Your Programs Float Safe! These lecture were breezed by Carl Pearson and Daniel Borup and then reviewed,...

  10. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 10: Control Flow

    31 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    Control FlowTopics: Terminology Review How Thread Blocks are Partitioned Control Flow Instructions Parallel Reduction A Vector Reduction Example A simple Implementation Vector Reduction With Bank Conflicts Vector Reduction With Branch Divergence Predicted Execution Concept Instruction Prediction...

  11. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 7: GPU as part of the PC Architecture

    21 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    GPU as part of the PC ArchitectureTopics: Typical Structure of a CUDA Program Bandwidth: Gravity of Modern computer Systems (Original) PCI Bus Specification PCI as Memory Mapped I/O PCI Express (PCI-E) PCI-E Links and Lanes PCI-E PC Architecture Intel Single Core System Intel Dual Core System AMD...

  12. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 8: Threading Hardware in G80

    24 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    Threading Hardware in G80Topics: Single Program Multiple Data (SPMD) Grids and Blocks CUDA Thread Block : Review Geforce-8 Series Hardware Overview CUDA Processor Terminology Stream Multiprocesor (SM) G80 Thread Computing Pipeline Thread Lifecycle in Hardware SM Executes Blocks Thread Scheduling...

  13. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 9: Memory Hardware in G80

    26 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    Memory Hardware in G80Topics: CUDA Device Memory Space Parallel Memory Sharing SM Memory Architecture SM Register File Programmer view of Register File Matrix Multiplication Example More on Dynamic Partitioning ILP vs. TLP Memory Layout of a Matrix in C Constants Shared Memory Parallel Memory...

  14. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 6: CUDA Memories - Part 2

    18 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    CUDA Memories Part2Topics: Tiled Multiply Breaking Md and Nd into Tiles Tiled Matrix Multiplication Kernel CUDA Code - Kernel Execution Configuration First Order Size considerations in G80 G80 Shared Memory and Threading Tiling Size Effects Typical Structure of a CUDA ProgramThese lecture were...

  15. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 5: CUDA Memories

    18 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    CUDA MemoriesTopics: G80 Implementation of CUDA Memories CUDA Variable Type Qualifiers Where to Declare Variables Variable Type Restrictions A Common Programming Strategy GPU Atomic Integer Operations Matrix Multiplication Using Shared Memory How About performance on G80? IDEA: Use Shared Memory...

  16. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 4: CUDA Threads - Part 2

    12 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    CUDA Threads Part2Topics: CUDA Thread Block Transparent Scalability G80 CUDA Mode, A Review Executing Thread Blocks Thread Scheduling Block Granularity Considerations More Details of API Features Application Programming Interface Language Extensions: Built-in Variables Common Runtime Component:...

  17. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 3: CUDA Threads, Tools, Simple Examples

    12 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    CUDA Threads, Tools, Simple ExamplesTopics: A Running example of Matrix Multiplication Memory Layout of a Matrix in C Compiling a CUDA Program Device Emulation Mode Pitfalls Floating Point CUDA Threads MAtrix Multiplication Using Multiple Blocks Transparent ScalabilityThese lecture were breezed...

  18. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 2: The CUDA Programming Model

    08 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    CUDA Programming ModelTopics: What is GPGPU? CUDA An Example of Physical Reality Behind CUDA Parallel computing on a GPU CUDA - C With no shader limitations CUDA Devices and Threads G80 Graphics Mode G80 CUDA Mode Arrays of Parallel Threads Thread Blocks, Scalable Cooperation Block ID's and...

  19. Illinois ECE 498AL: Programming Massively Parallel Processors, Lecture 1: Introduction

    11 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    Programming Massively Parallel ProcessorsTopics: Introduction, Grading, Outline Lab Equipment UIUC/NCSA QP Cluster UIUC/NCSA AP Cluster ECE498AL Development History Why Program Massively Parallel Processors? Geforce 8800 G80 Characteristics Future Apps reflect a concurrent world Stretching...

  20. Illinois ECE 498AL: Programming Massively Parallel Processors

    11 Aug 2009 | | Contributor(s):: Wen-Mei W Hwu

    Spring 2009 Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more...