Search code examples
pythonperlbashawknawk

parse file content and display tree view


Given a file with content:

insert_job: J1
insert_job: J2
box_name: J1
insert_job: J3
box_name: J2
insert_job: J4
box_name: J1
insert_job: J5
box_name: J4
insert_job: J6
box_name: J4

I'd like to display it as following (use tab to identify child-parent relationship):

J1
    J2
        J3
    J4
        J5
        J6
test_data2 for Borodin:
------------------------------
insert_job: JS11-LR_BaselIII
insert_job: JS11-Check_Batch_Run_Numbers
box_name: JS11-LR_BaselIII
insert_job: 11000000-start
box_name: JS11-Check_Batch_Run_Numbers
insert_job: 11000000-runbox
box_name: JS11-Check_Batch_Run_Numbers
insert_job: JS11-Load_Session_Date
box_name: JS11-LR_BaselIII
insert_job: JS110000-start
box_name: JS11-Load_Session_Date
insert_job: JS110000-runbox
box_name: JS11-Load_Session_Date
insert_job: JS11-Start_RiskWatch
box_name: JS11-LR_BaselIII
insert_job: JS110004-start
box_name: JS11-Start_RiskWatch
insert_job: JS110004-runbox
box_name: JS11-Start_RiskWatch
insert_job: JS11-Start_UDS
box_name: JS11-LR_BaselIII
insert_job: JS110001-start
box_name: JS11-Start_UDS
insert_job: JS110001-runbox
box_name: JS11-Start_UDS
insert_job: JS11-Pool_Processing
box_name: JS11-LR_BaselIII
insert_job: JS110002-start
box_name: JS11-Pool_Processing

syntax error in Ed's solution:

sdpvvrsp810{alelai}: gawk -f tst.awk testjobs3
gawk: tst.awk:2: /^box_name/   { box = $2; jobs[box][job] }
gawk: tst.awk:2:                                    ^ syntax error
gawk: tst.awk:9:         for (job in jobs[box])
gawk: tst.awk:9:                         ^ syntax error

Solution

  • Here is a somewhat shorter perl version that works with your sample data.

    sub parse {
      local $/ = undef;
      my $text = <>;
      my ($root) = $text =~ /insert_job:\s*(\S+)/;
      my @links = $text =~ /insert_job:\s*(\S+)\s*box_name:\s*(\S+)/g;
      my $children = {}; 
      while (@links) {
        my $child = shift @links;
        my $parent = shift @links;
        push @{$children->{$parent}}, $child;
      }
      my $print = sub {
        my ($print, $parent, $indent) = @_;
        print "\t" x $indent, $parent, "\n";
        $print->($print, $_, $indent + 1) foreach (@{$children->{$parent} || []});
      };
      $print->($print, $root, 0);
    }
    
    parse;