Search code examples

How to flatten recursive hierarchy using Hive/Pig/MapReduce

I have unbalanced tree data stored in tabular format like:


enter image description here

The depth of tree is unknow.

how to flatten this hierarchy where each row contains entire path from leaf node to root node in a row as:

leaf node, root node, intermediate nodes

Any suggestions to solve above problem using hive, pig or mapreduce? Thanks in advance.


  • I tried to solve it using pig, here are the sample code:

    Join function:

    -- Join parent and child
    Define join_hierarchy ( leftA, source, result) returns output {
        joined= join $leftA by parent left, $source by child;
        tmp_filtered= filter joined by source::parent is null;
        part= foreach tmp_filtered leftA::child as child, leftA::path as path;
        $result= union part, $result;
        part_remaining= filter joined by source::parent is not null;
        $output= foreach part_remaining generate $leftA::child as child, source::parent as parent, concat(concat(source::parent,':'),$leftA::path)

    Load dataset:

    --My dataset field delimiter is ','.    
    source= load '*****' using pigStorage(',') as (parent:chararray, child:chararray);
    --create additional column for path
    leftA= foreach source generate child, parent, concat(parent,':');  
    --initially result table will be blank.
    result= limit leftA 1;
    result= foreach result generate '' as child , '' as parent;
    --Flatten hierarchy to 4 levels. Add below lines equivalent to hierarchy depth.
    leftA= join_hierarchy(leftA, source, result);
    leftA= join_hierarchy(leftA, source, result);
    leftA= join_hierarchy(leftA, source, result);
    leftA= join_hierarchy(leftA, source, result);