Nvidia (8.0) installation for TensorFlow on Fedora 25
Just use Negativo’s Repo…
Since Nvidia totally screwed up the gcc versioning/ABI on Fedora 24, I decided to
take the easy option and use someone else’s pre-packaged Nvidia installation.
I had tried this method before (on previous Fedoras), but the choices of paths had
left me unconvinced (particularly since during the ‘teething’ phase of getting the
installation working, error messages can come from all sorts of sources/reasons).
Here’s a quick run-down of what has worked for me :
Clean out previous installations
Check that you’ve got a GPU
should result in a line that mentions your VGA adapter.
And then install the nvidia driver, and the necessary libraries for cuda operations.
Note that if you want X11 to run on the graphics card, you’ll obviously need a monitor
attached. However, since I didn’t attach a monitor to the machine while doing this,
it’s not proven that the video card ends up capable of doing anything but cuda operations :: But that’s fine with me,
because this is a machine that won’t ever have a monitor attached to it (much to the
disappointment of the gamers in the office).
The following will each pull in a load more dependencies (the Negativo repo is intentionally modular / fragmented) :
In my case, I also added an intel driver for the internal on-board video subsystem
(just so that X11 might be tempted to run if there’s a monitor plugged in - but check out the
companion post on how to get the X11
configuration working properly if you do want to add a monitor) :
Now after rebooting :
The key thing here are the references to nvidia and nvidia_uvm.
If you’ve got references to nouveau appearing in lsmod, something didn’t work correctly.
Install TensorFlow for the GPU
Looking within the TensorFlow installation instructions
for “Download and install cuDNN” shows that TensorFlow is expecting v8.0, which is good, because
that is what the Negativo packing supplies.
Now, find and install the right version of the TensorFlow wheel :
Test TensorFlow with the GPU
The following can be executed (the second line onwards will be within the Python REPL) :
This is what will appear if the installation DIDN’T WORK :
Save the following script as ./make-nvidia-device-nodes.bash and chmod 744 ./make-nvidia-device-nodes.bash :
Then, after executing ./make-nvidia-device-nodes.bash, the device /dev/nvidia0 should appear.
The reason it may not have been there before is that it is normally created on-demand by X11,
but in a headless/monitorless situation, it never gets called into existence. That’s what the script causes
Problems when updating kernels
If you had a working system, and then find that after an update the nvidia module is nowhere to
be found in lsmod, then you could try to regenerate the the module via dkms
(after rebooting into your latest kernel) :