Search code examples
pythonhadoopapache-pigudf

Simple Python UDF issue for Hadoop pig


I write a very simple Python and here is my UDF code, pig code and error message, any ideas what is wrong? Thanks.

UDF (test.py),

@outputSchema("cookie:chararray")
def getSimple():
    return 'Hello'

Pig code,

register test.py using jython as TestSimple;
a = TestSimple.getSimple() as word;

Error message,

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 1, column 0>  Syntax error, unexpected symbol at or near 'a'

thanks in advance, Lin


Solution

  • You need to LOAD some data than process it with your UDF. Like: Load data:

    A = LOAD 'input' USING PigStorage('\t','-schema');
    

    Process your data with UDF, let's say you have an id field in your input:

    B = FOREACH A GENERATE TestSimple.getSimple(id) as word;
    

    And of course you need to register your UDF as you did it correctly.