Papers for the course:
These papers are a stratified sample of work in GPU algorithms. The space of parallel algorithms covered in based on the so-called “Berkeley dwarfs” [1] , an attempt to describe what are the key algorithmic components from which most computationally-intensive applications can be built. The list is biased toward more recent work, although some early GPU papers—now almost a decade old—may also be found.
[1] Asanović, et al., “The Landscape of Parallel Computing Research: A View from Berkeley”, EECS Department, University of California, Berkeley, Dec. 2006, http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html.
List of Dwarfs
- Dense Linear Algebra
- Sparse Linear Algebra
- Spectral Methods
- N-Body Methods
- Structured Grids
- Unstructured Grids
- MapReduce
- Combinational Logic
- Graph Traversal
- Dynamic Programming
- Backtrack and Branch-and-Bound
- Graphical Models
- Finite State Machines
Papers for each Dwarf
- Dense Linear Algebra
-
Dense linear algebra solvers for multicore with GPU
accelerators
- Author(s): Stanimire Tomov, Rajib Nath, Hatem Ltaief, and Jack Dongarra
- 2010 IEEE International Symposium on Parallel&Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010
- Presenter:
- Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs
- Author(s): D. Mukunoki, T. Imamura and D. Takahashi
- 2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Turku, 2015
- Presenter:
-
Dense linear algebra solvers for multicore with GPU
accelerators
- Sparse Linear Algebra
- Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices
- Author(s): M. Daga and J. L. Greathouse
- 22nd International Conference on High Performance Computing (HiPC), Bangalore, 2015
- Presenter:
- How naive is naive SpMV on the GPU?,
IEEE Xplore
- Author(s): M. Steinberger, A. Derlery, R. Zayer and H. P. Seidel
- 2016 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, 2016
- Presenter:
- Structural Agnostic SpMV: Adapting CSR-Adaptive for Irregular Matrices
- Spectral Methods
-
Mixed Precision Method for GPU-based FFT
- Author(s): Shuhan Qi, Xuan Wang, Shaohuai Shi
- Computational Science and Engineering (CSE), 2011 IEEE 14th International Conference on
- Presenter:
-
Mixed Precision Method for GPU-based FFT
- N-Body
-
Scaling Hierarchical N-body Simulations on GPU Clusters
- Author(s): P. Jetley, L. Wesolowski, F. Gioachin, L. V. Kalé and T. R. Quinn
- 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, 2010
- Presenter: Connor Scully-Allison
- Paper d
-
Scaling Hierarchical N-body Simulations on GPU Clusters
- Structured Grids
- For solving partial differential equations (PDEs). If you are interested in PDE solving, consult with Lee Barford to choose one or two papers to present.
- Unstructured Grids
- For solving partial differential equations (PDEs). If you are interested in PDE solving, consult with Lee Barford to choose one or two papers to present
- MapReuduce
-
Multi-GPU MapReduce on GPU Clusters
- Author(s): J. A. Stuart and J. D. Owens
- 2011 IEEE International Parallel & Distributed Processing Symposium, Anchorage, AK, 2011
- Presenter: Om Mishra
- Paper a
-
Multi-GPU MapReduce on GPU Clusters
- Combinatorial Logic
- (don't use a GPU: use an FPGA)
- Graph traversal
-
An Efficient Implementation of the Bellman-Ford
Algorithm for Kepler GPU Architectures
- Author(s): F. Busato and N. Bombieri
- EEE Transactions on Parallel and Distributed Systems, vol. 27, no. 8
- Presenter: Steven Fisher
- Paper f
-
GBTL-CUDA: Graph Algorithms and Primitives for GPUs
- Author(s): P. Zhang, M. Zalewski, A. Lumsdaine, S. Misurda and S. McMillan
- 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, 2016
- Presenter:
-
Mini-Gunrock: A Lightweight Graph Analytics Framework
on the GPU
- Author(s): Wang, Yangziha;o Baxter, Sean; Owens, John D.,
- 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Orlando, FL.
- Presenter: Hannah Munoz
- Paper h
-
An Efficient Implementation of the Bellman-Ford
Algorithm for Kepler GPU Architectures
- Dynamic Programming
-
An Efficient Parallelization Strategy for Dynamic
Programming on GPU
- Author(s): Karl-Eduard Berger, Francois Galea
- 2013 IEEE 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum
- Presenter: Syed Zawad
- Paper g
-
Optimized GPU implementation for dynamic programming
in image data processing
- Author(s): J. Ke, T. Bednarz and A. Sowmya
- 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), Las Vegas, NV, 2016
- Presenter: Janelle Blankenburg
- Paper e
-
An Efficient Parallelization Strategy for Dynamic
Programming on GPU
- Backtrack and branch-and-bound
-
GPU Implementation of the Branch and Bound Method for
Knapsack Problems
- Author(s): M. E. Lalami and D. El-Baz
- 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, Shanghai, 2012
- Presenter: Blanca Miller
- Paper c
-
Reducing thread divergence in a GPU-accelerated
branch-and-bound algorithm
(Wiley)
- Author(s): I. Chakroun, M. Mezmaz, N. Melab, and A. Bendjoudi
- Concurrency and Computation: Practice and Experience, 06/2013, Volume 25, Issue 8
- Presenter:
-
GPU Implementation of the Branch and Bound Method for
Knapsack Problems
- Graphical Models
-
Deep learning with COTS HPC systems
- Author(s): Adam Coates et al.,
- Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28
- Presenter: Daniel Lopez
- Paper b
-
cuDNN: Efficient Primitives for Deep Learning
(arXiv.org)
- Author(s): Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, Evan Shelhamer
- arXiv.org
- Presenter: Aavaas Gajurel
- Paper k
-
Optimization of GPU and CPU acceleration for neural
networks layers implemented in Python
- Author(s): R. Dogaru and I. Dogaru
- 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE)
- Presenter: Chad Adams
- Paper i
-
Deep learning with COTS HPC systems
- Finite State Machine
-
GPEP: Graphics Processing Enhanced Pattern-Matching
for High-Performance Deep Packet Inspection
- Author(s): L. J. Vespa and N. Weng
- 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing, Dalian, 2011
- Presenter: Eric Li
- Paper j
-
Parallelizing small finite state machines, with
application to pulsed signal analysis
- Author(s): Lee A. Barford
- Instrumentation and Measurement Technology Conference (I2MTC), 2012 IEEE International, 13-16 May 2012, Pages 1957-1962
- Presenter:
-
GPEP: Graphics Processing Enhanced Pattern-Matching
for High-Performance Deep Packet Inspection
Presentation Order
- a - Om Mishra - Dwarf 7-1
- b - Daniel Lopez - Dwarf 12-1
- c - Blanca Miller - Dwarf 11-1
- d - Connor Scully-Allison - Dwarf 4-1
- e - Janelle Bankenburg - Dwarf 10-2
- f - Steven Fisher - Dwarf 9-1
- g - Syed Zawad - Dwarf 10-1
- h - Hannah Munoz - Dwarf 9-3
- i - Chad Adams - Dwarf 12-3
- j - Eric Li - Dwarf 13-1
- k - Aavaas Gajurel - Dwarf 12-2