Search code examples
google-cloud-platformcloudvirtual-machinegoogle-compute-engine

Availability of V100 and P100 on Google Compute Engine


Description

I just tried for some time to set up or reserve a virtual machine for machine learning with my personal account that I'm using for some months on n1 with around 8 or more GB Ram and either a P100 or a V100 for machine learning and now tried for at least half of all zones with P100/V100 availability and always get a Resource Error like this one:

Operation type [insert] failed with message "The zone 'projects/lexical-list-285719/zones/us-central1-c' does not have enough resources available to fulfill the request. Try a different zone, or try again later."

no resources available in zone-x. I recently switched from the trial.

Questions:

A) Is that common?

B) Is there a fix?

C) What (if anything) can I do to get a machine with these specifications, or similar performance?

I know that this is because of the zone not having these specifications available and that I'm supposed to try switching. I'm aware too of managed instance groups. But it can't be that difficult, can it?

Is google that booked out?

Possible Solutions

Currently my ideas to fix it:

  • multizone managed group (still have to check if my project is compatible with that)
  • cloud shell script that iterates through all available zones (would need to research how shell scripts works)

Anyone with experience in this topic sharing their experience with the solutions or with better solutions is very appreciated.

A good answer for me would not include any of the following:

  • Zone Switching (tried that)

  • Smaller machine (tried that and project doesn't work with too small machine)

  • Reserving (tried that)

  • Waiting (already know about that and doesn't help if I want a machine right now)

Though I recommend anyone with less persistent or urgent issues to do just those.


Solution

  • It's not an issue, events like this happens from time to time.

    This error message means that there's no available resources like CPU/RAM/GPU on the Google's side in the particular zone. More details the you can find at the documentation Troubleshooting VM creation section Resource availability:

    Resource errors occur when you try to request new resources in a zone that cannot accommodate your request due to the current unavailability of a Compute Engine resource, such as GPUs or CPUs.

    Resource errors only apply to new resource requests in the zone and do not affect existing resources. Resource errors are not related to your Compute Engine quota and only apply to the resource you specified in your request at the time you sent the request, not to all resources in the zone.

    Resource availability are depending from users requests and therefore are dynamic.

    There are a few ways to solve this issue:

    1. Try to create your instance at another zone where GPU is available (request an increase in quota if needed).
    2. Wait for a while and try again.
    3. Request some smaller VM (if possible), later you'll be able to try to request some bigger VM (same principle as for quota requests).
    4. Reserve resources for your VM by following documentation to avoid such issue in future (extra payment required).