The use of multi-processing is now becoming the “norm” with multiple cores now available in many consumer PCs and laptops. It is now much easier to start experimenting with speeding up your code at your desk and then transition to a larger resource such as Cardiff University’s supercomputer Raven. However, as expected, some jobs the use of multiple processors is not enough. It is common within machine learning arena to use accelerators since accelerators are ideal for the workload and can reduce the amount of time to analyse the data needed to create new models.
The term accelerators is used to describe any device that works alongside the CPU and can accelerate certain tasks. Researchers using Raven are now asking more frequently about Graphic Processing Units (GPUs) to process data. Nvidia is one company that produce these pluggable cards and traditionally called GPUs but they do not need to display graphics, just the design of them comes from the graphic card development community. AMD are another company and use the terminology Accelerated Processing Unit (APU) where both CPU and GPU capabilities are included in one package. Intel have a Intel Xeon Phi accelerator but will soon be merging that technology with their own CPUs. So how can these devices be utilized?
Using accelerator technologies
Unfortunately there are many solutions to choose from. Over a few blog posts, each of the following popular solutions will be looked at closer.
- CUDA – the technology supported by Nvidia and used in their cards only. It is used extensively in code due to the performance of the Nvidia cards have been leading the field of accelerators but does have the drawback of limiting your code to run on Nvidia devices.
- OpenCL – the technology supported by many companies including Nvidia. OpenCL can be used on many accelerator technologies and even on CPUs to enable use of CPU vectorization technologies such as SSE and AVX.
- OpenMP – the popular solution for parallalising code by threading your application. Traditionally used for CPUs only, with OpenMP 4 it can now use accelerators.
Your code needs to be suitable for accelerators – not all codes will benefit easily from using accelerator technologies but if a software package you are using comes with the option of using accelerators then you may get extra performance with very little work required by you. Over the next few blog posts CUDA, OpenCL and OpenMP will be explored with some simple examples to show how to use the solutions.