## Negativo’s Repo is a bit too quick…

Since Nvidia totally screwed up the gcc versioning/ABI on Fedora 24, I decided to take the easy option and use someone else’s pre-packaged Nvidia installation. My packager of choice has been Negativo.

However, the Negativo repo has been quickly updated to cuda-9.1 whereas it seems that the TensorFlow team has decided :

• that 9.1 is going to be an orphan version as far as TensorFlow goes;
• that everything will be fixed in the 1.9 release of TensorFlow; and
• if you’re upgrading, you’re basically stuck until the 1.9RC magically appears.

So, given that my installation was already working for Fedora 27, I wanted to do an upgrade without disturbing the existing cuda-9.0 packages.

Here’s a quick run-down of what has worked for me (building on the previous installation)…

### Sanity-check the existing installation

First ensure that the following is in /etc/dnf/dnf.conf :

This means that the standard upgrade won’t touch the kernel or working cuda installation (will be fixed later).

Also, check the versions of the nvidia driver, cuda and cudnn :

And also, take notes about which .conf files are in:

For my installation (which has a GPU card which isn’t connected to a monitor, and motherboard integrated intel graphics connected to the actual monitor), there are NO special .conf files : It all works via autoconfiguration. Note that NVidia has a habbit of trying to fix this up for you by writing their own configuration files without asking : These should be moved to .conf-disabled if you get any new problems with the following steps…

Copy the information above onto a machine that isn’t the one being upgraded, since if this process fails then there’s a chance you won’t be able to see anything on the monitor, which is frustrating.

### Upgrade the fedora version (excluding kernels)

Hopefully, everything should come back as before.

### Check that you’ve (still) got a GPU

Running :

should result in a line that mentions your VGA adapter, and the following modules should also be loaded :

The key thing here are the references to nvidia and nvidia_uvm.

If you’ve got references to nouveau appearing in lsmod, something didn’t work correctly.

### Now upgrade the nvidia driver

Change the /etc/dnf/dnf.conf :

And then run an update (and make sure you have nvidia-driver-NVML too, which confused me with bad version messages for ages) :

At this point, check that your /etc/X11 configuration hasn’t been messed up, before the :

### Disable further updates

Change the /etc/dnf/dnf.conf back :

### Check that the versions still make sense

Also, check the versions of the nvidia driver, cuda and cudnn (note that the latter sets are unchanged):

NB: These rpms are confirmed to work on my machine, at least.

### Install TensorFlow for the GPU

Install the latest TensorFlow version (currently 1.8) (this assumes python 3.x, which should be the obvious choice by now):

### Test TensorFlow with the GPU

The following can be executed (the second line onwards will be within the Python REPL) :

This is what will appear if the installation DIDN’T WORK :

### When it finally works…

Then the python REPL code :

Produces the following happy messages :

### Install PyTorch for the GPU

Looking within the PyTorch installation instructions we see that there’s an option for CUDA toolkit v9.0, which is good, and Python 3.6 is supported (also good).

Then finally test it with the same Hello World calculation as we did for TensorFlow :

All done.