OpenACC Tutorial + Workshop

Thank you for taking the time to read my blog on UDEL’s GPU Hackathon and checking out the article 🙂

After the success of the  GPU Hackathon (linked to the video) in early May this year, NVIDIA and PGI decided to come back to University of Delaware campus to host lectures and workshops on OpenACC from June 7th through June 9th, 2016.

Yay!!!! Thank you NVIDIA/PGI! 🙂

Tutorial on Day 1 was open to all registrants.

Presented by NVIDIA team members Mathew Colgrove, Abel Brown and Barton Fiske, the workshop introduced programming techniques using OpenACC and included topics such as optimization and profiling methods for GPU programming. The teams used PGI OpenACC compilers.


Barton Fiske of NVIDIA kicked-off the 3-day workshop by sharing how NVIDIA has emerged as the world leader in visual computing along with the different programs NVIDIA has to offer for academia including the teaching kit.


The lectures and hands-on exercises were given by Mathew Colgrove of NVIDIA’s PGI compiler team covering the use of OpenACC on GPU accelerated systems. GPUs, as you know, are the most pervasive parallel computing model, used by over 300,000 developers worldwide.

Attendees included faculty, undergraduate, post-graduate students and research scientists from Computer & Information Sciences, Electrical and Computer Engineering, Chemical and Biomolecular Engineering, Mechanical Engineering of the University of Delaware, faculty and students from Prof. Tomasz Smolinski’s team @ Delaware State University and from Prof. Haklin Kimm @ East Stroudsbrug University, Pennsylvania. IMG_2298.JPG

When you are teaching non-CS students, CS – don’t you feel you are on top of the world? 🙂 I do! Interdisciplinary research is so important!


OpenACC compilers can also target multicore platforms. Yes, they can. Read more.

So if you want to try OpenACC programming model on your quad-core or dual-core laptop, you would simply download the OpenACC Toolkit (free for academia) that includes the popular PGI Accelerator Fortran/C Compiler and developer tools for acceleration with OpenACC. (this is just in case you do not have a PGI compiler license, yet). More useful resources along with online course materials.


Mini-workshop on Day 2 and Day 3: 

A team of 4 from East Stroudsbrug University (ESU) and a team of 6 from Delaware State University (DSU) worked on parallelizing their codes that represented Evolutionary Algorithm, Dynamic Programming Algorithm and Satellite Image Processing Algorithm.

The team from ESU were using Matlab for image processing and have been trying to use OpenMP and OpenACC directives. To them it seemed as though it was impossible to use directives for their image processing code.

A CS Masters student, Aakashdeep Goyal from ESU says “The workshop was not only limited to discuss the OpenACC framework but also provided a background study of the various existing parallel processing alternatives through open discussions.”

So this is the part I enjoy the most about the Hackathon as well as the workshop. It’s just a brilliant forum to brainstorm ideas on the white board with mentors and a bunch of eagerly-awaiting-to-learn participants. IMG_0985This group learnt that “libraries” are the way to go!! NVdians helped the team use  OpenCV libraries instead of MATLAB and were able to integrate that with OpenMP on Ubuntu 14.04. The team used Eclipse for the same. Since the code was in MATLAB to begin with, the team spent both the days converting the code to OpenCV.

Now that the team has undergone vigorous training on OpenACC and know to use OpenCV, they plan to use OpenACC directives for C++ enabled OpenCV and later on using CUDA. The aim is to test the non-parametric regression model along with other filtering algorithms for edge detection and linkage using the OpenACC directives. The team is confident to have a working OpenACC code within the next several weeks. (This sounds positive so stay tuned for updates :-)!)

Another algorithm that one of their team members, Zuqing Z Li presented, was the Dynamic Programming algorithm. This is a classic wavefront-based problem! Every cell depends on all of its neighboring cells making it a very interesting problem since unless you fully compute the upper triangle, you cannot compute the cells of the leading diagonal and so on and so forth. There are other research groups that have used CUDA on exploiting wavefront parallelization. So we discussed with the team some of the CUDA strategies that could be transformed to OpenACC and the team is looking forward to implementing some of those strategies and probably even use MPI + OpenACC across nodes.


The team from DSU brainstormed parallelization of an Evolutionary Algorithm. These are algorithms inspired by the biological model of evolution. Genetic Algorithm (GA) is the most common type of Evolutionary Algorithm. The team came with the bulk of the evolutionary library to the workshop but their goal was to learn ways to parallelize the algorithm. As the slide presented by Prof. Tomasz Smolinski shows, the library in c++ was in development since 1997 (lots of legacy code!!!)

IMG_0979The team’s goal was to transplant their Multi-Objective Evolutionary Algorithms (MOEA) library onto the GPU platform. The library is application-agnostic, and has been successfully utilized in various domains, including computational modeling of neurons, signal decomposition, and mining for association rules in large data sets. Ultimately, the library will be the engine behind their open-source application, called NeRvolver, which will allow users from all over the world, through a web interface, generate and analyze neuronal models.

IMG_0982The team spent most of Day 2 brainstorming with Tristan Vanderbruggen and Robert Searles – mentors from University of Delaware and expert programmers of accelerators, about how to manage moving data to and from the host and the device.

Ahaa moment !! After several white board sessions, the conclusion was that the new code would create the initial genotypes on the GPU, after which crossover and mutation would occur. Then these individuals would be sent to the simulator, which returns the fitness values of these models to the GPU. On the GPU, they also hoped to store their archive of elite models, which would be updated throughout the simulation.

But wait a minute – that was not all of it, there was yet another challenge- the size of the archive would change over time and become larger than the population (i.e. size of each generation) and therefore how to allocate the appropriate space on the device???

Well – I guess they were glad that they have identified the challenge! 🙂 Sometimes finding the problem can be a challenge (Now, how many of you have experienced that!! ;-))

By the end of Day 3, with Mat Colgrove’s help, the team had a working OpenACC C++ code of the algorithm!!! The code had several compute kernels denoting it was thoroughly compute-intensive and could benefit from GPUs while using OpenACC.

Although they are at the beginning of the tunnel at the moment, Karla M Miletti, an undergraduate student at the CIS department from DSU is hopeful to take this to the next level. She says:

“Before Wednesday we were simply hopeful we could use GPU’s to optimize our algorithm since the evolutionary library actually passes control to a simulator (such as Neuron) which usually runs sequentially. However I think we managed to find a good application of OpenACC and high performance computing to our evolutionary algorithm. Eventually we hope to figure out how to parallelize the simulator”.