Search code examples
hadoopapache-pig

Which file format is available on Apache Pig?


I'm new to Apache Pig.

I'm not sure which input file format is available on Pig.

For example, Parquet, Text, Avro, RCFile and SequenceFile are available on Impala. (See: How Impala Works with Hadoop File Formats)

I guess text file is okay because data loading example is using .log file. (See: Getting Started) Also I found AvroStorage page, so Avro is available.

And then, how about Parquet, RCFile, SequenceFile and more? Or, am I something wrong?

Please advise me, thanks.


Solution

  • Using the built-in functions of the 1.4 version, you can read the following :

    1. BinStorage
    2. JsonLoader, JsonStorage
    3. PigDump
    4. PigStorage
    5. TextLoader
    6. HBaseStorage
    7. AvroStorage
    8. TrevniStorage
    9. AccumuloStorage
    10. OrcStorage

    With gzip and bzip compression support for some loaders.

    You can use HCatalog to read data from any other Hadoop component.

    And many other loaders in the piggybank library.

    Else, you can write your own loader.