Papers for the course:

These papers are a stratified sample of work in GPU algorithms. The space of parallel algorithms covered in based on the so-called “Berkeley dwarfs” [1] , an attempt to describe what are the key algorithmic components from which most computationally-intensive applications can be built. The list is biased toward more recent work, although some early GPU papers—now almost a decade old—may also be found.

[1] Asanović, et al., “The Landscape of Parallel Computing Research: A View from Berkeley”, EECS Department, University of California, Berkeley, Dec. 2006, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html.

List of Dwarfs

Dense Linear Algebra
Sparse Linear Algebra
Spectral Methods
N-Body Methods
Structured Grids
Unstructured Grids
MapReduce
Combinational Logic
Graph Traversal
Dynamic Programming
Backtrack and Branch-and-Bound
Graphical Models
Finite State Machines

Papers for each Dwarf

Dense Linear Algebra
- Dense linear algebra solvers for multicore with GPU accelerators
  - Author(s): Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra
  - 2010 IEEE International Symposium on Parallel&Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010
  - Presenter:
- Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs
  - Author(s): D. Mukunoki, T. Imamura and D. Takahashi
  - 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turku, 2015
  - Presenter:
Sparse Linear Algebra
- Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices
  - Author(s): M. Daga and J. L. Greathouse
  - 22nd International Conference on High Performance Computing (HiPC), Bangalore, 2015
  - Presenter:
- How naive is naive SpMV on the GPU?, IEEE Xplore
  - Author(s): M. Steinberger, A. Derlery, R. Zayer and H. P. Seidel
  - 2016 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, 2016
  - Presenter:
Spectral Methods
- Mixed Precision Method for GPU-based FFT
  - Author(s): Shuhan Qi, Xuan Wang, Shaohuai Shi
  - Computational Science and Engineering (CSE), 2011 IEEE 14th International Conference on
  - Presenter:
N-Body
- Scaling Hierarchical N-body Simulations on GPU Clusters
  - Author(s): P. Jetley, L. Wesolowski, F. Gioachin, L. V. Kalé and T. R. Quinn
  - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, 2010
  - Presenter: Connor Scully-Allison
  - Paper d
Structured Grids
- For solving partial differential equations (PDEs). If you are interested in PDE solving, consult with Lee Barford to choose one or two papers to present.
Unstructured Grids
- For solving partial differential equations (PDEs). If you are interested in PDE solving, consult with Lee Barford to choose one or two papers to present
MapReuduce
- Multi-GPU MapReduce on GPU Clusters
  - Author(s): J. A. Stuart and J. D. Owens
  - 2011 IEEE International Parallel & Distributed Processing Symposium, Anchorage, AK, 2011
  - Presenter: Om Mishra
  - Paper a
Combinatorial Logic
- (don't use a GPU: use an FPGA)
Graph traversal
- An Efficient Implementation of the Bellman-Ford Algorithm for Kepler GPU Architectures
  - Author(s): F. Busato and N. Bombieri
  - EEE Transactions on Parallel and Distributed Systems, vol. 27, no. 8
  - Presenter: Steven Fisher
  - Paper f
- GBTL-CUDA: Graph Algorithms and Primitives for GPUs
  - Author(s): P. Zhang, M. Zalewski, A. Lumsdaine, S. Misurda and S. McMillan
  - 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, 2016
  - Presenter:
- Mini-Gunrock: A Lightweight Graph Analytics Framework on the GPU
  - Author(s): Wang, Yangziha;o Baxter, Sean; Owens, John D.,
  - 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL.
  - Presenter: Hannah Munoz
  - Paper h
Dynamic Programming
- An Efficient Parallelization Strategy for Dynamic Programming on GPU
  - Author(s): Karl-Eduard Berger, Francois Galea
  - 2013 IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum
  - Presenter: Syed Zawad
  - Paper g
- Optimized GPU implementation for dynamic programming in image data processing
  - Author(s): J. Ke, T. Bednarz and A. Sowmya
  - 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), Las Vegas, NV, 2016
  - Presenter: Janelle Blankenburg
  - Paper e
Backtrack and branch-and-bound
- GPU Implementation of the Branch and Bound Method for Knapsack Problems
  - Author(s): M. E. Lalami and D. El-Baz
  - 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, Shanghai, 2012
  - Presenter: Blanca Miller
  - Paper c
- Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm (Wiley)
  - Author(s): I. Chakroun, M. Mezmaz, N. Melab, and A. Bendjoudi
  - Concurrency and Computation: Practice and Experience, 06/2013, Volume 25, Issue 8
  - Presenter:
Graphical Models
- Deep learning with COTS HPC systems
  - Author(s): Adam Coates et al.,
  - Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28
  - Presenter: Daniel Lopez
  - Paper b
- cuDNN: Efficient Primitives for Deep Learning (arXiv.org)
  - Author(s): Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, Evan Shelhamer
  - arXiv.org
  - Presenter: Aavaas Gajurel
  - Paper k
- Optimization of GPU and CPU acceleration for neural networks layers implemented in Python
  - Author(s): R. Dogaru and I. Dogaru
  - 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE)
  - Presenter: Chad Adams
  - Paper i
Finite State Machine
- GPEP: Graphics Processing Enhanced Pattern-Matching for High-Performance Deep Packet Inspection
  - Author(s): L. J. Vespa and N. Weng
  - 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, Dalian, 2011
  - Presenter: Eric Li
  - Paper j
- Parallelizing small finite state machines, with application to pulsed signal analysis
  - Author(s): Lee A. Barford
  - Instrumentation and Measurement Technology Conference (I2MTC), 2012 IEEE International, 13-16 May 2012, Pages 1957-1962
  - Presenter:

Presentation Order

a - Om Mishra - Dwarf 7-1
b - Daniel Lopez - Dwarf 12-1
c - Blanca Miller - Dwarf 11-1
d - Connor Scully-Allison - Dwarf 4-1
e - Janelle Bankenburg - Dwarf 10-2
f - Steven Fisher - Dwarf 9-1
g - Syed Zawad - Dwarf 10-1
h - Hannah Munoz - Dwarf 9-3
i - Chad Adams - Dwarf 12-3
j - Eric Li - Dwarf 13-1
k - Aavaas Gajurel - Dwarf 12-2

Department of
Computer Science and Engineering

Home-FCH

Home-791-GPU

Papers for the course:

List of Dwarfs

Papers for each Dwarf

Presentation Order

Department of Computer Science and Engineering

Home-FCH

Home-791-GPU

Papers for the course:

List of Dwarfs

Papers for each Dwarf

Presentation Order

Department of
Computer Science and Engineering