When trying to create a dataproc cluster, I get the "Quota exceed" error.
ERROR: (gcloud.beta.dataproc.clusters.create) INVALID_ARGUMENT: Insufficient 'DISKS_TOTAL_GB' quota. Requested 3000.0, available 2048.0.
I have changed the machine types and also reduced number of workers to 2. Further, if I specify master and worker boot disk size I get an unrecognized argument error
I am using GCP free tier. I am trying to follow the steps from Google codelab - https://codelabs.developers.google.com/codelabs/pyspark-bigquery/index.html?index=..%2F..index#5
I have enabled three APIs for this GCP project - Compute Engine, Data Proc and Big Query.
I have already set the machine types as below: worker-machine-type as n1-standard-2 master-machine-type as n1-standard-2
First Attempt
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--zone=${ZONE} \
--worker-machine-type n1-standard-8 \
--num-workers 4 \
--image-version 1.4-debian9 \
--initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh \
--metadata 'PIP_PACKAGES=google-cloud-storage' \
--optional-components=ANACONDA \
--enable-component-gateway
Second Attempt: helped remove some quota errors
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--zone=${ZONE} \
--worker-machine-type n1-standard-2 \
--master-machine-type n1-standard-2 \
--num-workers 2 \
--image-version 1.4-debian9 \
--initialization-actions gs://dataproc-initialization-actions/python/pip-install.sh \
--metadata 'PIP_PACKAGES=google-cloud-storage' \
--optional-components=ANACONDA \
--enable-component-gateway
Third Attempt
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--zone=${ZONE} \
--master-machine-type=n1-standard-2 \
--master-boot-disk-size=500GB \
--worker-machine-type=n1-standard-2 \
--worker-boot-disk-size=500GB \
--num-workers=2 \
--image-version=1.4-debian9 \
--initialization-actions=gs://dataproc-initialization-actions/python/pip-install.sh \
--metadata='PIP_PACKAGES=google-cloud-storage' \
--optional-components=ANACONDA \
--enable-component-gateway
I expected the above command to create the cluster instead I get unrecognized arguments message right after the parameter - master-boot-disk-size=500GB (see the error message below).
gcloud beta dataproc clusters create ${CLUSTER_NAME} \
--zone=${ZONE} \ --master-machine-type=n1-standard-2 \ --master-boot-disk-size=500GB \
ERROR: (gcloud.beta.dataproc.clusters.create) unrecognized arguments:
You have an extra space after your backslash on the line where you specify disk sizes:
$ hexdump -C foop.txt
00000000 67 63 6c 6f 75 64 20 62 65 74 61 20 64 61 74 61 |gcloud beta data|
00000010 70 72 6f 63 20 63 6c 75 73 74 65 72 73 20 63 72 |proc clusters cr|
00000020 65 61 74 65 20 24 7b 43 4c 55 53 54 45 52 5f 4e |eate ${CLUSTER_N|
00000030 41 4d 45 7d 20 5c 0a 20 20 20 20 20 2d 2d 7a 6f |AME} \. --zo|
00000040 6e 65 3d 24 7b 5a 4f 4e 45 7d 20 5c 0a 20 20 20 |ne=${ZONE} \. |
00000050 20 20 2d 2d 6d 61 73 74 65 72 2d 6d 61 63 68 69 | --master-machi|
00000060 6e 65 2d 74 79 70 65 3d 6e 31 2d 73 74 61 6e 64 |ne-type=n1-stand|
00000070 61 72 64 2d 32 20 5c 0a 20 20 20 20 20 2d 2d 6d |ard-2 \. --m|
00000080 61 73 74 65 72 2d 62 6f 6f 74 2d 64 69 73 6b 2d |aster-boot-disk-|
00000090 73 69 7a 65 3d 35 30 30 47 42 20 5c 20 0a 20 20 |size=500GB \ . |
000000a0 20 20 20 2d 2d 77 6f 72 6b 65 72 2d 6d 61 63 68 | --worker-mach|
000000b0 69 6e 65 2d 74 79 70 65 3d 6e 31 2d 73 74 61 6e |ine-type=n1-stan|
000000c0 64 61 72 64 2d 32 20 5c 0a 20 20 20 20 20 2d 2d |dard-2 \. --|
000000d0 77 6f 72 6b 65 72 2d 62 6f 6f 74 2d 64 69 73 6b |worker-boot-disk|
000000e0 2d 73 69 7a 65 3d 35 30 30 47 42 20 5c 20 0a 20 |-size=500GB \ . |
000000f0 20 20 20 20 2d 2d 6e 75 6d 2d 77 6f 72 6b 65 72 | --num-worker|
00000100 73 3d 32 20 5c 0a 20 20 20 20 20 2d 2d 69 6d 61 |s=2 \. --ima|
00000110 67 65 2d 76 65 72 73 69 6f 6e 3d 31 2e 34 2d 64 |ge-version=1.4-d|
00000120 65 62 69 61 6e 39 20 5c 0a 20 20 20 20 20 2d 2d |ebian9 \. --|
00000130 69 6e 69 74 69 61 6c 69 7a 61 74 69 6f 6e 2d 61 |initialization-a|
00000140 63 74 69 6f 6e 73 3d 67 73 3a 2f 2f 64 61 74 61 |ctions=gs://data|
00000150 70 72 6f 63 2d 69 6e 69 74 69 61 6c 69 7a 61 74 |proc-initializat|
00000160 69 6f 6e 2d 61 63 74 69 6f 6e 73 2f 70 79 74 68 |ion-actions/pyth|
00000170 6f 6e 2f 70 69 70 2d 69 6e 73 74 61 6c 6c 2e 73 |on/pip-install.s|
00000180 68 20 5c 0a 20 20 20 20 20 2d 2d 6d 65 74 61 64 |h \. --metad|
00000190 61 74 61 3d 27 50 49 50 5f 50 41 43 4b 41 47 45 |ata='PIP_PACKAGE|
000001a0 53 3d 67 6f 6f 67 6c 65 2d 63 6c 6f 75 64 2d 73 |S=google-cloud-s|
000001b0 74 6f 72 61 67 65 27 20 5c 0a 20 20 20 20 20 2d |torage' \. -|
000001c0 2d 6f 70 74 69 6f 6e 61 6c 2d 63 6f 6d 70 6f 6e |-optional-compon|
000001d0 65 6e 74 73 3d 41 4e 41 43 4f 4e 44 41 20 5c 0a |ents=ANACONDA \.|
000001e0 20 20 20 20 20 2d 2d 65 6e 61 62 6c 65 2d 63 6f | --enable-co|
000001f0 6d 70 6f 6e 65 6e 74 2d 67 61 74 65 77 61 79 0a |mponent-gateway.|
Any gcloud command where you try to continue on the next line using a backslash but you actually type a space after the backslash will try to register the space itself as a command -- normally the purpose of the backslash is to come right before a \n
newline character to escape the newline rather than ending the current command; if the backslash precedes other whitespace, then it only applies to escaping the space and then the subsequent newline actually marks the end of the command, and then the space character gets passed to gcloud as a real "argument" instead of being trimmed by the shell like normal.