Search code examples
apache-pighcatalog

WebHCat & Pig - how to pass a parameter file to the job?


I am using HCatalog's WebHCat API to run Pig jobs, such as documented here:

https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig

I have no problem running a simple job but I would like to attach a parameters file to the job, such as one can do using pig command line's parameter: --param_file .

I assume this is possible through arg request's parameter, so I tried multiple things, such as passing:

'arg': '-param_file /path/to/param.file'

or:

'arg': {'param_file': '/path/to/param.file'}

None seems to work, and error stacks don't say much. I would love to know if this is possible, and if so, how to correctly achieve this.

Many thanks


Solution

  • Correct usage:

    'arg': ['-param_file', '/path/to/param.file']
    

    Explanation: By passing the value in arg,

    'arg': {'-param_file': '/path/to/param.file'}
    

    webhcat generates "-param_file" for the command prompt. Pig throws the following error

    ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Can not create a Path from a null string
    

    Using a comma instead of the colon operator passes the path to file as a second argument. webhcat will generate "-param_file" "/path/to/param.file"

    P.S: I am using Requests library on python to make the REST calls