Search code examples
hadoophivecascading

Lateral View functionality in Cascading


I have table which is like:

TableName: myTab

+----+---------------------+
| ID |        Codes        |
+----+---------------------+
| 1  | ABC,DEF,GHI,JLK,MNO |
+----+---------------------+

I am developing Cascading application which should convert above table into following:

+----+---------------------+------+
| ID |        Codes        | code |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | ABC  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | DEF  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | GHI  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | JLK  |
+----+---------------------+------+
| 1  | ABC,DEF,GHI,JLK,MNO | MNO  |
+----+---------------------+------+

If I use Hive, it can be done very easily using LATERAL VIEW.

SELECT 
    ID, Codes, Code
FROM 
    myTab LATERAL VIEW explode(Codes) codesTab AS code

But I want to do same thing in Cascading. Is there a way to do it?


Solution

  • It can be done using a function (there may be other ways). Just need to add new Tuple to the OutputCollector for each and every token.

    Like:

    import static com.google.common.base.Preconditions.checkArgument;
    import cascading.flow.FlowProcess;
    import cascading.operation.BaseOperation;
    import cascading.operation.Function;
    import cascading.operation.FunctionCall;
    import cascading.tuple.Fields;
    import cascading.tuple.Tuple;
    
    public class TestLateralView extends BaseOperation<Void> implements Function<Void> {
        private static final long serialVersionUID = 1L;
    
        public TestLateralView(Fields fields) {
            super(fields);
            checkArgument(fields.size() == 1);
        }
    
        @Override
        public void operate(@SuppressWarnings("rawtypes") FlowProcess flowProcess, FunctionCall<Void> functionCall) {
            Tuple tuple = functionCall.getArguments().getTuple();
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < tuple.size(); i++) {
                sb.append(tuple.getString(i));
                sb.append(",");
            }
    
            String[] tokens = sb.toString().split(",");
    
            for (String token : tokens) {
                functionCall.getOutputCollector().add(new Tuple (token));
            }
        }
    } 
    

    With above Function I am getting expected output.

    In the Assembly, above function can be called as:

    pipe = new Each(pipe, CODES, new TestLateralView(CODE), Fields.ALL);