Search code examples
hadoopmapreduceapache-pig

Apache Pig Load Function Bag as input possible?


if I write a custom Load Function with the constructor

MyLoadFunction(String someOptions, DataBag myBag)

How can I execute this function with piglatin?

X = load 'foo.txt' using MyLoadFunction('myString', myBagAlias);

this does not work, is it even possible?

thanks


Solution

  • I'm not sure your need is suitable for Pig. Pig is all about loading up a lot of data and then putting that data through a pipeline. It sounds like you want something more procedural, to load a small amount of data, do some processing, make a decision based on that, and follow that algorithm to completion.

    So I'm not sure this is the best way for you to go, but you can try writing a UDF that will access HBase and grab the data you need. LOAD is inappropriate here because LOAD does not return a bag, it returns a relation that Pig expects you to put through some transformations. But you can pass a bag as input to a UDF, and then inside that UDF to do the HBase lookup and processing you want to do.

    A more Pig-ish way of doing things would be to load all of the relevant HBase data into one or more relations, and then do a JOIN as appropriate to combine the pieces of data you want together.