Search code examples
pythonlinuxapache-sparkioredhat

args python parser, a whitespace and Spark


I have this code in foo.py:

from argparse import ArgumentParser
parser = ArgumentParser()
parser.add_argument('--label', dest='label', type=str, default=None, required=True, help='label')
args = parser.parse_args()

and when I execute:

spark-submit --master yarn --deploy-mode cluster foo.py --label 106466153-Gateway Arch

I get this error at Stdout:

usage: foo.py [-h] --label LABEL
foo.py: error: unrecognized arguments: Arch

Any idea(s) please?


Attempts:

  1. --label "106466153-Gateway Arch"
  2. --label 106466153-Gateway\ Arch
  3. --label "106466153-Gateway\ Arch"
  4. --label="106466153-Gateway Arch"
  5. --label 106466153-Gateway\\\ Arch
  6. --label 106466153-Gateway\\\\\\\ Arch

All attempts produce the same error.


I am using Red Hat Enterprise Linux Server release 6.4 (Santiago).


Solution

  • Here is a nasty workaround:

    from argparse import ArgumentParser
    parser = ArgumentParser()
    parser.add_argument('--label', dest='label', type=str, default=None, required=True, help='label', nargs="+")
    args = parser.parse_args()
    args = ' '.join(args.label)
    print args
    

    where I am using nargs="+" and then join the arguments.

    I execute like this:

    spark-submit --master yarn --deploy-mode cluster foo.py --label "106466153-Gateway Arch"

    Also notice that this approach can work when no space exists, like this:

    spark-submit --master yarn --deploy-mode cluster foo.py --label "106466153-GatewayNoSpaceArch"