Below is “Nvidia’s way” (modified to actually work)

Even though Nvidia still hasn’t provided RPMs for Fedora 22 (which was launched a couple of months ago as-of this post date, having been in Alpha for 3 months prior), we can fix up their code as it installs.

This write-up is simply a condensed version of Dr Donald Kinghorn’s excellent write-up (with which it’s probably best to follow along, opened in a separate tab) plus additional instructions concerning the building of Theano.

Set up a scratch directory

As root :

cd ~   # pwd = /root/
mkdir fedora22-cuda
cd fedora22-cuda/

Nvidia Driver download (for later)

Go to the Nvidia Driver download page, and grab the 76Mb driver package, for installation later…

CUDA Driver download (installed first)

Download the 1Gb CUDA local installer for RHEL7 (1Gb):

CUDA7=http://developer.download.nvidia.com/compute/cuda/7_0
RPMDEB=${CUDA7}/Prod/local_installers/rpmdeb
wget ${RPMDEB}/cuda-repo-rhel7-7-0-local-7.0-28.x86_64.rpm

Install CUDA using Nvidia’s repo

cd ~/fedora22-cuda # pwd=/root/fedora-cuda/
dnf install cuda-repo-rhel7-7-0-local-7.0-28.x86_64.rpm 
dnf install cuda

Fix the path & library directories globally

echo 'export PATH=$PATH:/usr/local/cuda/bin' >> /etc/profile.d/cuda.sh
ls -l /usr/local/cuda/lib64
echo '/usr/local/cuda/lib64' >> /etc/ld.so.conf.d/cuda.conf
ldconfig

Now install the graphics drivers

To enable the ‘DKMS’ part of the installer will run, make sure you have the kernel module compilation parts available:

dnf install kernel-devel

Before this point, the Nvidia software has not actually checked for the presence of an Nvidia video card.

Now run the Nvidia installer (look at the notes in this section for answer-hints):

chmod 755 NVIDIA-Linux-x86_64-352.21.run 
./NVIDIA-Linux-x86_64-352.21.run 
  • Say “Yes” to the question about registering with DKMS

  • Say “Yes” to the question about 32-bit libs

It should now compile the NVIDIA kernel modules…

  • Say “No” to the question about running nvidia-xconfig!

Now reboot.

Test the installation

To see that your driver is installed and working properly, check that the kernel modules are there :

sudo lsmod | grep nv
# Output::
nvidia_uvm             77824  0
nvidia               8564736  1 nvidia_uvm
drm                   331776  4 i915,drm_kms_helper,nvidia

Check on the CUDA compiler:

/usr/local/cuda/bin/nvcc --version
# Output::
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Mon_Feb_16_22:59:02_CST_2015
Cuda compilation tools, release 7.0, V7.0.27

And an actual check on the card itself :

sudo nvidia-smi -L
# Output::
GPU 0: GeForce GTX 760 (UUID: GPU-b8075eeb-56ff-4595-7901-eef770de8296)

Fix the CUDA headers to accept new gcc

Now, as root, fix up Nvidia’s header file that disallows gcc greater than v4.9

In file /usr/local/cuda-7.0/include/host_config.h, look to make the following replacement :

// #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ > 9)  // Old version commented out
// This is the updated line, which guards again gcc > 5.1.x instead
#if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 1)

Test the CUDA functionality

As a regular user, compile the CUDA samples from within a clean directory :

cd ~        # for instance
mkdir cuda  # for instance
cd cuda
rsync -av /usr/local/cuda/samples .
cd samples/
make -j4
cd bin/x86_64/linux/release/
./deviceQuery

Cleaning up

If everything tests out Ok above, then the /root/fedora22-cuda directory can be safely deleted.

The Theano Part

Installation of libgpuarray

To install the bleeding edge libgpuarray into your virtualenv, first compile the .so and .a libraries that the module creates, and put them in a sensible place :

. env/bin/activate
cd env
git clone https://github.com/Theano/libgpuarray.git
cd libgpuarray
mkdir Build
cd Build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr
make
sudo make install

This will likely complain about not finding clBLAS, which isn’t a problem here. Although, if you know you will require clBLAS in the future (and this is for advanced/experimental users only), see my OpenCL post, since you need to install this before running cmake above).

It may also complain about :

    runtime library [libOpenCL.so.1] in /usr/lib64 may be hidden by files in:
      /usr/local/cuda/lib64

This won’t affect the CUDA functionality (its impact on OpenCL is still TBD).

Next, install the Python component (after going into the same virtualenv) :

cd env/libgpuarray/
python setup.py build
python setup.py install

And then test it from within a regular user directory (using the same virtualenv) :

python
import pygpu
pygpu.init('cuda0')

A good result is something along the lines of :

<pygpu.gpuarray.GpuContext object at 0x7f1547e79550>
## Errors seen :
#(A) 'cuda'      :: 
##  pygpu.gpuarray.GpuArrayException: API not initialized = WEIRD

#(B) 'cuda0'     :: 
##  pygpu.gpuarray.GpuArrayException: No CUDA devices available = GO BACK...

#(C) 'opencl0:0' :: 
##  RuntimeError: Unsupported kind: opencl (if OpenCL library not found)

Theano stuff

Store the following to a file gpu_check.py :

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print f.maker.fgraph.toposort()
t0 = time.time()
for i in xrange(iters):
    r = f()
t1 = time.time()
print 'Looping %d times took' % iters, t1 - t0, 'seconds'
print 'Result is', r
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
    print 'Used the cpu'
else:
    print 'Used the gpu'

And then run, successively :

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cpu   python gpu_check.py
""" output is ::
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.35117197037 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761 1.62323284]
Used the cpu
"""

and

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu   python gpu_check.py
""" output is ::
Using gpu device 0: GeForce GTX 760
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.339042901993 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761 1.62323296]
Used the gpu
"""

but

THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=cuda0   python gpu_check.py
""" output is ::
*FAILURE...*
"""

Check on the usage of GPU / BLAS

TP=`python -c "import os, theano; print os.path.dirname(theano.__file__)"`
THEANO_FLAGS=mode=FAST_RUN,floatX=float32,device=gpu python ${TP}/misc/check_blas.py

## GPU : 0.14s (GeForce GTX 760)
## CPU : 5.72s (i7-4770 CPU @ 3.40GHz)

OpenCL stuff (for another day)

dnf -y install clinfo ocl-icd opencl-tools 


Martin Andrews

{Finance, Software, AI} entrepreneur, living in Singapore with my family.



blog comments powered by Disqus