### GCP VM with Nvidia set-up and Convenience Scripts

#### Everything from CLI

This is a short-but-complete guide to setting up a Google Cloud Virtual Machine with Nvidia drivers and cuda installed.

### Outline

These are the steps that this involves, and leaves the final machine pretty quick to start/stop:

• Create a preemptible GPU-enabled GCP machine
• Set up ssh so that it can be used easily
• Include ‘tunnels’ that enable ports for jupyter or tensorboard to be accessed securely through plain ssh
• With scripts to enable fast start/stop and mount/unmount of the VM
• Install software on the GCP machine:
• the Nvidia drivers with cuda
• The Deep Learning tools can be set up in a virtual enviroment as usual

All-in-all, the below enables us to use GCP as a good test-bed for projects for Deep Learning - while keeping expenses under control!

### Create a suitable Google Cloud VM

This setup includes the more recent 2022-04 release of Ubuntu :

And then run this to actually create the instance:

This complains about Disk <200Gb, but finally declares …

NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS minerl-vm-host asia-southeast1-b n1-standard-8 true 10.148.0.8 34.XX.XX.XX RUNNING

NB: The ./3-on-start.bash bit adds some extensibility for auto-running processes later - and we’ll use it to check whether the instance is ready for ssh, so we’ll put something there just below.

FWIW, from the GCP VM pricing estimate (though the numbers below are only when it’s switched ON, apart from the persistent disk) is :

So : for US 22 cents an hour, we have an 8 core machine with 30GB of RAM, and a 16Gb GPU. Pretty Nice!

The --accelerator=type=nvidia-tesla-t4,count=1 choice is clearly one that depends on your requirements - at the T4 level, it just makes the instance like a more reliable colab, but with the option for tensorboard, persistence and local disk mounting (to give some key advantages).

The --machine-type=N1-standard-8 choice may be a bit of an overkill for a Deep Learning instance, though (compared to scaling up the GPU side) it’s relatively low incremental cost for the additional room / cores provided.

Now, log into the instance and create an empty 3-on-start.bash file:

### Scripting the instance

In a previous blog post, I’ve given some important CLI commands to bring up the instance manually, but this is the version I’ve added some scripted versions to my ~/.bashrc which simplifies life.

Once the ~/.bashrc (the new stuff is given below) executes, you can then start development sessions with :

Note that I have a standard location from where I launch the instance, which includes a folder called ./gcp_base so that the main user directory of the GCP machine gets mounted there, and I can then edit files on the VM as if they were just local files (i.e. any local editor will work directly).

And then do the following when finished :

Code to put in your ~/.bashrc:

Looking through the code above, hopefully you can see :

• There’s a GCP_SERVER_DEFAULT setting for the INSTANCE_NAME
• This can be overridden on the command line
• The start script :
• Waits for the VM to be fully ready (printing a dot each time - normally 5 dots get printed);
• Sets up GCP_ADDR as a convenience variable;
• Performs an ssh into the machine that includes port-forwarding for 8585, 8586 and 5005
• just some useful ports (for instance for jupyter, tensorboard, FastAPI to be present on)
• The mount script :
• Mounts the user directory onto ./gcp_base locally (as mentioned above)
• so that you can edit, or view/copy files, using purely local development tools
• The stop and umount scripts are self-explanatory

### Install Nvidia drivers

Use ssh \${GCP_ADDR} to get into the GCP machine, and run (these instructions basically follow those from Nvidia).

The following installs the nvidia driver:

### Install Nvidia cuda

The following installs the cuda drivers:

NB: Since the hard disk we’ve chosen is Persistent, all of this installation only needs to be done once.

### Install Deep Learning frameworks

See my previous blog post for details.

NB: If you actually want to run the Deep Learning environment within a container, skip this step and head over to my next blog post.

### Terminate the GCP VM when done…

Using the scripted commands from above

All done!

### Footnote

The above process for ‘GCP machine as local GPU’ works so well that I sold my local GPU (Nvidia Titan X 12Gb, Maxwell) at the beginning of 2022, and migrated onto GCP for ‘real-time’ development. One benefit (apart from all-in cost) has also been the ability to seamlessly upgrade to a larger GPU set-up once the code works, without having to make an infrastructure changes (i.e. disk can be brought up on a larger machine near instantly).

Shout-out to Google for helping this process by generously providing Cloud Credits to help get this all implemented!