Search code examples
mapreduceapache-pigclouderahue

How to Set Custom Delimiter in PIG


What is the correct syntax to set a custom TextInputFormat delimiter in Pig? I've tried several variations on the following but its treating it as string values instead of Carriage Return Line Feed.

set textinputformat.record.delimiter '\r\n';

Pig Version is 0.12.0-cdh5.9.0 and Hadoop Version is 2.6.0-cdh5.9.0


Solution

  • Not ideal but a workaround:

    Create a properties file like myprops.properties which contains the following line: textinputformat.record.delimiter=\r\n

    Then run your script like: pig -P ~/myprops.properties -f path/to/pigscript.pig

    It looks like this is a known issue as mentioned in the fourth paragraph of the fourth comment of: PIG_4572