Published on

Google Cloud Start up with Sessions Running

CLI set-up
Authors

Initialisation script to set up VM from cold start

Just having a nicely configured VM doesn't anwer the whole question in a machine learning scenario. In particular, using preemptible machines means that the VM is hard-stopped at minimum every 24hrs, and a typical way to do training, etc, involves setting up screen to allow for clear disconnects.

So, in addition to mounting an extra drive, wouldn't it be nice to get some screen sessions going with the correct paths, virtualenv already running, and ready-to-roll?

Create a startup script locally

Create this script as a local file startup.bash (clearly, your details will be very different, but some essential elements are included here) :

#!/bin/bash

username='notauser'   # Clearly this needs to be changed

# This shows that the script is run as root on startup...
echo "root=$(whoami)"

mkdir -p /mnt/rdai
mount -o discard,defaults /dev/sdb /mnt/rdai

cd /mnt/rdai
chown ${username}:${username} .

# Become the user ...
su - ${username} <<'EOF'
echo "username=$(whoami)"

cd /useful/path/

NL="$(printf \\r)"
ACTIVATE_ENV=". ~/env36/bin/activate"

# First, ensure the screen session exists, and then 'stuff' entries into it
screen -dmS model
screen -S model       -p 0 -X stuff "${ACTIVATE_ENV}${NL}"
screen -S model       -p 0 -X stuff "python train.py${NL}"

screen -dmS tensorboard
screen -S tensorboard -p 0 -X stuff "${ACTIVATE_ENV}${NL}"
screen -S tensorboard -p 0 -X stuff "tensorboard --logdir=runs/${NL}"

EOF

Create a 'startup hook' for the GCP machine

Post that script to the VM using the gcloud add-metadata command from your local machine :

gcloud compute instances add-metadata $INSTANCE_NAME \
    --metadata-from-file startup-script=startup.bash

Test that it works

Go on! Just start an instance and see whether it works!

If you need to change the script, just make the changes locally, and redo the add-metadata step : It seems to replace what was there before. Then, you'll have to stop and start the instance again - best done during down-time, rather than when hot on the trail of the latest model...