Search code examples
apache-pig

Does pig support load with no delimiter?


I'd like to load a lot of small files from HDFS with Pig and process them as tuples (filename, filecontent).

a=LOAD 'mydir' USING PigStorage('','-tagPath') AS (filepath:chararray, filecontents:chararray);

However it seems like I cannot omit specifying the delimiter. Is there some sort of a "NULL" in Pig or is there any other way to make sure the content of the file will not be split?


Solution

  • You will have to write your own custom loader by extending LoadFunc.

    Short answer to your question is no.In order to make sure the content is not split,use a delimiter that would not exist in the content.In that way, the whole content would be loaded to the field filecontents:chararray.So assuming,your input files do not have a special character '~'

    a=LOAD 'mydir' USING PigStorage('~','-tagPath') AS (filepath:chararray, filecontents:chararray);