Setting up musrfit / DKS: High Speed Fitting with GPU’s

In the years 2016/2017 we explored ways to speed up current fitting frameworks, especially musrfit. This allows now to analyze histogram sets of high field spectrometers like HAL-9500 at PSI without the error-prone RRF fitting (see U. Locans and A. Suter, musrfit - Real Time Parameter Fitting Using GPU, and the Memo from A. Suter, “Rotating Reference Frame Fits”, in the musrfit source code). At the same time it can help to speed-up elaborate global fits tremendously, and dealing properly with muonium. It also allows Apple macOS users to speed up their fitting code on the CPU. Currently it is not straight forward to get musrfit multi-threaded under macOS since Apple doesn’t be default support OpenMP. DKS enables musrfit to utilize OpenCL instead which is present on macOS by default.

Warning

Before you run into the shop to buy a gamer graphic card or a Tesla card, make sure that you have an appropriate server with a sufficiently strong power supply!

Note

However, the current musrfit/DKS version doesn’t yet support all theory functions on the GPU. In case the theory function is not yet available for the GPU, musrfit will fall back to the CPU implementation.

Conceptually the setup of musrfit/DKS is as following:

  1. install the latest hardware driver for your graphic card.
  2. install the GPU SDK which enables number crunching (CUDA for NVIDIA, OpenCL for AMD)
  3. install DKS
  4. install the musrfit version which is DKS ready

In the following the description for the installation of musrfit/DKS for the following systems will be discussed in some more detail:

  • NVIDIA Tesla K40c
  • AMD Graphic Card (Radeon R9 390X)
  • macOS in order to get OpenCL support

The usage of musrfit with GPU acceleration and OpenCL support is described in the User manual of the μSR data analysis software musrfit. The additional musrfit/DKS are found here.

Setting up musrfit/DKS for a Tesla K40c (NVIDIA)

It is assumed that the Tesla K40c is already physically installed on your system. For now I only will discuss to set it up for a Linux based system. In order to check that your operating systems see the card, enter the following command in the terminal:

$ lspci | grep NVIDIA

The response should look something like

05:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40c] (rev a1)

which means that the OS physically recognizes your card.

Driver Installation for the Tesla K40c

Next, you will need to download and install the driver for your card. Select the proper operating system, card, etc. from the NVIDIA download center. At PSI we are running currently Red Hat Enterprise Linux 7.x (RHEL) for which we will get a rpm (something like nvidia-diag-driver-local-repo-rhel7-375.66-1.x86_64.rpm). Install it and make sure there is no conflict with the nouveau driver of the system.

Installation of CUDA

Download the CUDA SDK form NVIDIA for your system. Again, for the RHEL 7.x this is an rpm. After the installation of the rpm you should reboot your machine. Afterwards you are ready for the installation of DKS.

Installation of DKS

For the following list of commands the '$' will be given as the command prompt. Do not enter it! Also some comments will be added starting with a '#' which can be omitted. They are only there to explain what is going on. DKS stands for Dynamical Kernel Scheduler and provides a thin interface allowing host applications to incorporate GPU’s and other hardware accelerators.

Details can be found in the papers listed here, or on the DKS wiki page.

In brief the installation should be something like this:

# go to whatever directory you would like to clone/install DKS
# For macOS DKS will likely to got to $HOME/Applications to be consistent with the musrfit docu for macOS
$ cd $HOME/Apps
$ git clone https://gitlab.psi.ch/uldis_l/DKS.git
$ cd DKS
$ mkdir build
$ cd build
$ cmake ../ -DENABLE_MUSR=1 -DCMAKE_INSTALL_PREFIX=../exec
$ cmake --build ./ --clean-first
$ make install

Since DKS is installed in a non-standard path, a couple of additional small steps are required. This will be different for Linux compared to macOS.

For Linux:

add the DKS library path to /etc/ld.so.conf.d/musrfit-x86_64.conf and execute as super user

$ /sbin/ldconfig

For macOS:

add the DKS path to $HOME/.profile:

export DKS=$HOME/Applications/DKS/exec
export LD_LIBRARY_PATH=$DKS/lib:$LD_LIBRARY_PATH

launchctl setenv DKS $DKS
launchctl setenv LD_LIBRARY_PATH $LD_LIBRARY_PATH

Installation of musrfit for DKS

Most of the installation steps are the same as described for musrfit without GPU support. Here only the differences are explained. First checkout musrfit, then you will need to switch the working branch which is done by

$ cd $HOME/Apps/musrfit
$ git checkout dks6

Install via cmake

There is on more configuration switch

-Ddks=<value>
it allows to enable/disable DKS support. The default is <value>=1, i.e. enabled. To disable use <value>=0.

For a typical setup on a RHEL or macOS system it could look like this

$ cmake ../ -DCMAKE_INSTALL_PREFIX=$ROOTSYS -DASlibs=1 -DBMWlibs=1 -Dnexus=1 -Ddks=1

After

$ cmake --build ./ --clean-first -- -j8
$ make install

and updating the shared library lookup table (only needed for Linux)

$ /sbin/ldconfig # as superuser / root

you are done with the setup.

Setting up musrfit/DKS for a AMD Graphic Card (Radeon R9 390X)

Driver Installation for an AMD Graphic Card, e.g. Radeon R9 390X

This will depend slightly on the AMD Card and operating system. Here I will summaries how it was done on a RHEL (Linux) system using a Radeon R9 390X.

It is assumed that the Radeon R9 390X is already physically installed on your system. For now I only will discuss to set it up for a Linux based system. In order to check that your operating systems see the card, enter the following command in the terminal:

$ lspci | grep AMD

The response should look something like

84:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT / Grenada XT [Radeon R9 290X/390X] (rev 80)

which means that the OS physically recognizes your card.

For RHEL7.x the AMDGPU-PRO driver should be used. It can be downloaded from AMD. Unpack the driver

$ tar -Jxvf amdgpu-pro-17.10-414273.tar.xz
$ cd amdgpu-pro-17.10-414273

Install the driver as root

$ ./amdgpu-pro-install --compute -y

Here I assume that the AMD graphic card is only used for computation. You need to add the following command in order that the user blabla (change this to the appropriate user name) can access the GPU (otherwise only root works):

$ /sbin/usermod -a -G video blabla

Reboot the machine.

AMD APP Software Development Kit (SDK) to enable OpenCL support

The AMD APP Software Development Kit (SDK) is a complete development platform created by AMD to allow you to quickly and easily develop applications accelerated by AMD APP technology. The SDK provides samples, documentation, and other materials to quickly get you started leveraging accelerated compute using OpenCL or C++ AMP in your C/C++ applications.

Download the AMD APP SDK 3.0 from AMD-SDK.

Extract the installer

$ tar -xvjf AMD-APP-SDKInstaller-v3.0.130.136-GA-linux64.tar.bz2

Run the installer

$ ./AMD-APP-SDK-v3.0.130.136-GA-linux64.sh

This will install the AMD APP SDK to /opt/AMDAPPSDK-3.0/ where you can find the OpenCL include and library files, as well as documentation and sample code. The install guide for AMD OpenCL SDK can be found at AMD SDK Installation Notes.

Installation of DKS and musrfit

To install DKS and musrfit follow the instructions above.

Setting up musrfit/DKS for macOS for OpenCL support

Since Apple is not providing an out-of-the-box OpenMP support on their macOS compiler framework (Xcode), typically musrfit is just running single threaded. Here DKS can help since it delivers OpenCL support which is present on macOS. Hence, if you would like to run musrfit multi-threaded the easiest way is to use DKS.

Since there is no graphic card involved, you do not need any graphic card driver of additional SDK. The only thing you need DKS and the proper musrfit version.

The installation instruction for DKS/musrfit can be found here.