Search code examples
apache-pigdistributed-cachepig-udf

Pig Distributed cache


What is the difference between getShipFiles & getCacheFiles in EvalFunc class?

Assuming any file specified in this method are available to exec method from distributed cache


Solution

  • getCacheFiles() Allow a UDF to specify a list of hdfs files it would like placed in the distributed cache.

    getShipFiles() Allow a UDF to specify a list of local files it would like placed in the distributed cache.

    So getShipFiles get files from local files and Cache get them form HDFS.