Nvidia (10.0) for TensorFlow & PyTorch on Fedora 28
Now tf-nightly & PyTorch work on cuda 10 …
Ever since Nvidia totally screwed up the gcc versioning/ABI on Fedora 24, I decided to
take the easy option and use someone else’s pre-packaged Nvidia installation. My packager
of choice has been Negativo.
However, around Fedora 26/27 the Negativo repo was quickly updated to cuda-9.1
whereas it seems that the TensorFlow team decided to skip 9.1
and move to 9.2 directly, but not go for 10.0 when all Fedora,
the repos and PyTorch did.
The situation has now become more unified, assuming you’re willing to
take a risk on installing the tensorflownightly builds (1.13.xxx),
which have been cuda 10.0 ready since (apparently) mid-Dec-2018.
Here’s a quick run-down of what has worked for me (having had to install Nvidia/cuda from the Nvidia website for Fedora recently,
or compiling tensorflow from scratch, which are both painful )…
( Happy to be back using a ‘proper’ repo, and pip install for the frameworks again. )
Clean out previous installations
Check that you’ve got a GPU
should result in a line that mentions your VGA adapter.
And then install the nvidia driver, and the necessary libraries for cuda operations.
Note that if you want X11 to run on the graphics card, you’ll obviously need a monitor
attached. However, since I didn’t attach a monitor to the machine while doing this,
it’s not proven that the video card ends up capable of doing anything but cuda operations :: But that’s fine with me,
because this is a machine that won’t ever have a monitor attached to it (much to the
disappointment of the gamers in the office).
The following will each pull in a load more dependencies (the Negativo repo is intentionally modular / fragmented) :
In my case, I also added an intel driver for the internal on-board video subsystem
(just so that X11 might be tempted to run if there’s a monitor plugged in - but check out the
companion post on how to get the X11
configuration working properly if you do want to add a monitor, and also enable
the Nvidia card for CUDA without it having a display attached) :
Now after rebooting :
The key thing here are the references to nvidia and nvidia_uvm.
If you’ve got references to nouveau appearing in lsmod, something didn’t work correctly.
To counteract this, install the (now available) TensorFlow ‘nightly’ build, which is apparently built to be
ready for the latest versions
(this assumes python 3.x, which should be the obvious choice by now):
Test TensorFlow with the GPU
The following can be executed (the second line onwards will be within the Python REPL) :
This is the kind of message that will appear if the installation DIDN’T WORK :
Fixing the /dev/nvidia0 problem
This should not happen if you’re running on the Nvidia card as a display adapter, or
have installed the nvidia-modprobe package above. If there’s still a problem,
have a look at the solution previously found.