Search code examples
google-cloud-platformgoogle-cloud-dataproc

Create a cluster without exceeding Quotas


When trying to create a dataproc cluster, I get the "Quota exceed" error.

ERROR: (gcloud.beta.dataproc.clusters.create) INVALID_ARGUMENT: Insufficient 'DISKS_TOTAL_GB' quota. Requested 3000.0, available 2048.0.

I have changed the machine types and also reduced number of workers to 2. Further, if I specify master and worker boot disk size I get an unrecognized argument error

I am using GCP free tier. I am trying to follow the steps from Google codelab - https://codelabs.developers.google.com/codelabs/pyspark-bigquery/index.html?index=..%2F..index#5

I have enabled three APIs for this GCP project - Compute Engine, Data Proc and Big Query.

I have already set the machine types as below: worker-machine-type as n1-standard-2 master-machine-type as n1-standard-2

First Attempt

gcloud beta dataproc clusters create ${CLUSTER_NAME} \
 --zone=${ZONE} \
 --worker-machine-type n1-standard-8 \
 --num-workers 4 \
 --image-version 1.4-debian9 \
 --initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh \
 --metadata 'PIP_PACKAGES=google-cloud-storage' \
 --optional-components=ANACONDA \
 --enable-component-gateway 

Second Attempt: helped remove some quota errors

 gcloud beta dataproc clusters create ${CLUSTER_NAME} \
     --zone=${ZONE} \
     --worker-machine-type n1-standard-2 \
     --master-machine-type n1-standard-2 \
     --num-workers 2 \
     --image-version 1.4-debian9 \
     --initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh \
     --metadata 'PIP_PACKAGES=google-cloud-storage' \
     --optional-components=ANACONDA \
     --enable-component-gateway

Third Attempt

gcloud beta dataproc clusters create ${CLUSTER_NAME} \
     --zone=${ZONE} \
     --master-machine-type=n1-standard-2 \
     --master-boot-disk-size=500GB \ 
     --worker-machine-type=n1-standard-2 \
     --worker-boot-disk-size=500GB \ 
     --num-workers=2 \
     --image-version=1.4-debian9 \
     --initialization-actions=gs://dataproc-initialization-actions/python/pip-install.sh \
     --metadata='PIP_PACKAGES=google-cloud-storage' \
     --optional-components=ANACONDA \
     --enable-component-gateway

I expected the above command to create the cluster instead I get unrecognized arguments message right after the parameter - master-boot-disk-size=500GB (see the error message below).

gcloud beta dataproc clusters create ${CLUSTER_NAME} \

 --zone=${ZONE} \
 --master-machine-type=n1-standard-2 \
 --master-boot-disk-size=500GB \

ERROR: (gcloud.beta.dataproc.clusters.create) unrecognized arguments:

enter image description here


Solution

  • You have an extra space after your backslash on the line where you specify disk sizes:

    $ hexdump -C foop.txt
    00000000  67 63 6c 6f 75 64 20 62  65 74 61 20 64 61 74 61  |gcloud beta data|
    00000010  70 72 6f 63 20 63 6c 75  73 74 65 72 73 20 63 72  |proc clusters cr|
    00000020  65 61 74 65 20 24 7b 43  4c 55 53 54 45 52 5f 4e  |eate ${CLUSTER_N|
    00000030  41 4d 45 7d 20 5c 0a 20  20 20 20 20 2d 2d 7a 6f  |AME} \.     --zo|
    00000040  6e 65 3d 24 7b 5a 4f 4e  45 7d 20 5c 0a 20 20 20  |ne=${ZONE} \.   |
    00000050  20 20 2d 2d 6d 61 73 74  65 72 2d 6d 61 63 68 69  |  --master-machi|
    00000060  6e 65 2d 74 79 70 65 3d  6e 31 2d 73 74 61 6e 64  |ne-type=n1-stand|
    00000070  61 72 64 2d 32 20 5c 0a  20 20 20 20 20 2d 2d 6d  |ard-2 \.     --m|
    00000080  61 73 74 65 72 2d 62 6f  6f 74 2d 64 69 73 6b 2d  |aster-boot-disk-|
    00000090  73 69 7a 65 3d 35 30 30  47 42 20 5c 20 0a 20 20  |size=500GB \ .  |
    000000a0  20 20 20 2d 2d 77 6f 72  6b 65 72 2d 6d 61 63 68  |   --worker-mach|
    000000b0  69 6e 65 2d 74 79 70 65  3d 6e 31 2d 73 74 61 6e  |ine-type=n1-stan|
    000000c0  64 61 72 64 2d 32 20 5c  0a 20 20 20 20 20 2d 2d  |dard-2 \.     --|
    000000d0  77 6f 72 6b 65 72 2d 62  6f 6f 74 2d 64 69 73 6b  |worker-boot-disk|
    000000e0  2d 73 69 7a 65 3d 35 30  30 47 42 20 5c 20 0a 20  |-size=500GB \ . |
    000000f0  20 20 20 20 2d 2d 6e 75  6d 2d 77 6f 72 6b 65 72  |    --num-worker|
    00000100  73 3d 32 20 5c 0a 20 20  20 20 20 2d 2d 69 6d 61  |s=2 \.     --ima|
    00000110  67 65 2d 76 65 72 73 69  6f 6e 3d 31 2e 34 2d 64  |ge-version=1.4-d|
    00000120  65 62 69 61 6e 39 20 5c  0a 20 20 20 20 20 2d 2d  |ebian9 \.     --|
    00000130  69 6e 69 74 69 61 6c 69  7a 61 74 69 6f 6e 2d 61  |initialization-a|
    00000140  63 74 69 6f 6e 73 3d 67  73 3a 2f 2f 64 61 74 61  |ctions=gs://data|
    00000150  70 72 6f 63 2d 69 6e 69  74 69 61 6c 69 7a 61 74  |proc-initializat|
    00000160  69 6f 6e 2d 61 63 74 69  6f 6e 73 2f 70 79 74 68  |ion-actions/pyth|
    00000170  6f 6e 2f 70 69 70 2d 69  6e 73 74 61 6c 6c 2e 73  |on/pip-install.s|
    00000180  68 20 5c 0a 20 20 20 20  20 2d 2d 6d 65 74 61 64  |h \.     --metad|
    00000190  61 74 61 3d 27 50 49 50  5f 50 41 43 4b 41 47 45  |ata='PIP_PACKAGE|
    000001a0  53 3d 67 6f 6f 67 6c 65  2d 63 6c 6f 75 64 2d 73  |S=google-cloud-s|
    000001b0  74 6f 72 61 67 65 27 20  5c 0a 20 20 20 20 20 2d  |torage' \.     -|
    000001c0  2d 6f 70 74 69 6f 6e 61  6c 2d 63 6f 6d 70 6f 6e  |-optional-compon|
    000001d0  65 6e 74 73 3d 41 4e 41  43 4f 4e 44 41 20 5c 0a  |ents=ANACONDA \.|
    000001e0  20 20 20 20 20 2d 2d 65  6e 61 62 6c 65 2d 63 6f  |     --enable-co|
    000001f0  6d 70 6f 6e 65 6e 74 2d  67 61 74 65 77 61 79 0a  |mponent-gateway.|
    

    Any gcloud command where you try to continue on the next line using a backslash but you actually type a space after the backslash will try to register the space itself as a command -- normally the purpose of the backslash is to come right before a \n newline character to escape the newline rather than ending the current command; if the backslash precedes other whitespace, then it only applies to escaping the space and then the subsequent newline actually marks the end of the command, and then the space character gets passed to gcloud as a real "argument" instead of being trimmed by the shell like normal.