Search code examples
apache-pigclouderaooziehue

Pig UDF receives or uses wrong parameter


I am running a pig script trough Oozie. The script uses a UDF.

The UDF gets its parameters like this:

public Float exec(Tuple input) throws IOException {

    if (input == null || input.size() == 0)
        return new Float(0);

    FileSystem fs = FileSystem.get(UDFContext.getUDFContext().getJobConf());

    String firstModel = input.get(1).toString();

    InputStream firstModel = fs.open(new Path(firstModel));
    ...

In the Oozie debug, the ingoing parameter seems to be ok:

  -param
  firstModel_firstscript=./en-sent.bin

in the script itself it looks like this:

%DEFAULT firstModel_firstscript 'somedefaultstuffthatisntused/firstmodel.bin';
...
myUDF(document, '$firstModel_firstscript', '$secondmodel_firstscript', '$lastmodel_firstscript') AS score;

The same results go for

myUDF(document, '${firstModel_firstscript}', '${secondmodel_firstscript}', '${lastmodel_firstscript}') AS score;

in STDERR it reads:

ERROR 2078: Caught error from UDF: my.domain.udf.myUDF [File does not exist: /user/cloudera/firstmodel_firstscript

note that it isn't the directory that I should have passed.

I'm at a loss here.... Hope I explained my situation clear enough.

Regards


Solution

  • I found that I was passing hadoop settings in my script the wrong way.

    Using:

    set xyz firstmodel_firstscript;
    

    instead of

    set xyz $firstmodel_firstscript;
    

    even tough the values were already set via %default, this is still the way to do it.