Is there a way to get access to .pig_schema or .pig_header value into a pig java udf, so that I know which field name is being parsed.
I work on an PigStorage
output generated by a different process and it keep changing rapidly. I want to make as many less changes as possible due to these changes.
For example : previous format would like like - {name:chararray, age:INT, salary:DOUBLE}
current format would look like - {sex:chararray, name:chararray, age:INT, salary:Double }
.
In my udf i am only interested in name and salary, but the order in which they are given as my input can vary as stated above.
For what I've seen in the Pig code, since 0.11, Pig has schema tuples. With the schematuple.udf
option set (default), schema is passed to UDF functions, and can be obtained within the UDF exec()
method calling getInputSchema()
. When you get the schema, you'll get the names of the different elements of the schema. You can then select the fields you want based on their name.