Move an instance between zones

The Google docs for this are really pretty good (for once), so the below is mainly for my future reference, and to illustrate the steps that actually worked first time…

Why move zones?

I was getting repeated (i.e. for more than 12hrs) denials for starting a 2 core VM with a P100 attached in us-central1-c (which I kind of assumed would have decent availability when I semi-randomly chose it in the first place).

Choose the new zone

Looking at the Google GPU zones to see where GPUs might be available, I selected something closer to Singapore … Taiwan. Note that the quotas page had already encouraged me to obtain quota in a whole bunch of places that I now see don’t even have GPUs available : Honestly, it seems like GCP should be more mature than to try this kind of MVP/Lean Startup stuff.

Crazily, the auto-move tool that GCP supplies can only move instances that are currently running. Which is a bit of an eye-roller, since the reason that I want to move zones is because GCP won’t allow me to start the instance…

Make sure the machine and disks can be shut down safely :

Check the metadata for the instance (before we delete it) … It may well be worth saving this to a file :

Snapshot the disks (assuming you’ve got one BOOT disk and one DATA disk, which seems like a reasonable set-up to me) :

Having snapshotted (make sure they complete without errors!), apparently the original disks have to be deleted to avoid name-clashes across the zones :

Then, recreate the disks from the snapshots in the destination zone :

Finally, create the instance from scratch with the disks attached (NB: after doing this, you will need to reinstall the metadata, such as the startup and shutdown scripts described elsewhere in this blog) :

Finally, once the instance confirmed to be working in the new zone …

Not sure that anyone reads these, but :

Starting of an instance was repeatedly refused in us-central1-c (no idea what the actual reason was), so I decided to move to an asian zone with P100s. Since I couldn’t start the instance in us-central1-c, the automatic tooling for moving the instance could not be used. So I went through all the steps (bringing down carefully, snapshotting, recreating) only to find that quota needed to be assigned (a ‘2 day’ process). Rather disappointed at that point that it was Google’s own systems that were (a) preventing me from launching an instance; (b) made it time-consuming to transfer; and then (c) refused to allow me to start the instance.

Perhaps it would make sense to have some kind of ‘availability’ test ahead of time?

I also see that I’ve got ‘quota’ in lots of other zones in the asia region - but there are no actual GPUs to be had there (so the quota mechanism was merely to check for demand : Very clever for Google, but rather anti-customer in the UI).

Things are working now (in Asia), but this was not a smooth experience.