Jose Refojo, Trinity Centre for High Performance Computing, Trinity College Dublin
In the past few years, the field of high performance computing has seen a huge uptake in the use of specialized hardware that is able to outperform standard CPUs. Initially this process was centred on the use of specialized accelerator cards, such as the CSX series of cards from Clearspeed, but in general this approach was not overly successful. The market seemed to be wary of investing heavily in untested hardware from relatively small and unknown companies.
At the same time, the main video card manufacturers started experimenting with using their tried and tested technology as a platform for delivering computational power. The decision to use commodity hardware from graphics processing units (GPU) meant that, while there would be a learning curve for programmers to fully utilize the cards, there was no barrier to entry for initial testing as programmers could just use the GPU already installed in their computer. Once satisfied that they were seeing improved performance, they could then be happy to invest in dedicated acceleration hardware. This approach has proven to be very successful with many of the worlds largest supercomputers now a hybrid of standard servers with GPU accelerators attached to the nodes.
Initially the process of programming graphics cards to carry out numerical tasks was complex, often taking a year for a competent programmer to generate useful code and performance figures. Coding involved transforming the numerical task into a graphics problem of shading vertices, then using the features of the card to carry out the shading and finally transforming the results back into the original problem. With the introduction of new general purpose GPU libraries and toolkits such as CUDA (Nvidia), Firestream (ATI) and the new cross platform OpenCL (both Nvidia and ATI) the process has been greatly simplified to the point where experienced C programmers can become expert GPGPU coders in a matter of a few weeks.
Since 2007 TCHPC has been experimenting with GPGPU technology, initially using commodity graphics cards in dedicated workstations. In 2009 a pair of Tesla devices were added to nodes in the Lonsdale cluster and a CUDA club was set up as a self-help, peer learning group focussed on taking advantage of the features of the Teslas. A formal CUDA training course has now been developed, aimed at computational researchers, allowing them to modify their code to run on GPGPUs. This course is open to programmers and researchers in all Irish higher education institutions and is run several times per year.