I have table which is like:
TableName: myTab
+----+---------------------+
| ID | Codes |
+----+---------------------+
| 1 | ABC,DEF,GHI,JLK,MNO |
+----+---------------------+
I am developing Cascading application which should convert above table into following:
+----+---------------------+------+
| ID | Codes | code |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | ABC |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | DEF |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | GHI |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | JLK |
+----+---------------------+------+
| 1 | ABC,DEF,GHI,JLK,MNO | MNO |
+----+---------------------+------+
If I use Hive, it can be done very easily using LATERAL VIEW.
SELECT
ID, Codes, Code
FROM
myTab LATERAL VIEW explode(Codes) codesTab AS code
But I want to do same thing in Cascading. Is there a way to do it?
It can be done using a function (there may be other ways). Just need to add new Tuple to the OutputCollector for each and every token.
Like:
import static com.google.common.base.Preconditions.checkArgument;
import cascading.flow.FlowProcess;
import cascading.operation.BaseOperation;
import cascading.operation.Function;
import cascading.operation.FunctionCall;
import cascading.tuple.Fields;
import cascading.tuple.Tuple;
public class TestLateralView extends BaseOperation<Void> implements Function<Void> {
private static final long serialVersionUID = 1L;
public TestLateralView(Fields fields) {
super(fields);
checkArgument(fields.size() == 1);
}
@Override
public void operate(@SuppressWarnings("rawtypes") FlowProcess flowProcess, FunctionCall<Void> functionCall) {
Tuple tuple = functionCall.getArguments().getTuple();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < tuple.size(); i++) {
sb.append(tuple.getString(i));
sb.append(",");
}
String[] tokens = sb.toString().split(",");
for (String token : tokens) {
functionCall.getOutputCollector().add(new Tuple (token));
}
}
}
With above Function I am getting expected output.
In the Assembly, above function can be called as:
pipe = new Each(pipe, CODES, new TestLateralView(CODE), Fields.ALL);