I'm trying to parse a file that contains lines in a hierarchical structure. For example the file:
a b c
a b d
a B C
A B C
indicates that a
contains b
and B
, that b
contains c
and d
, that B
contains C
. A
contains a different B
which contains its own C
.
This is much like a list of files.
I want to format this in a hierarchical bracketed way like:
a {
b {
c
d
}
B {
C
}
}
A {
B {
C
}
}
I couldn't come up with a decent way to do this. I thought that AWK would be my best bet, but came up short with how to actually implement it.
My input is actually a list of files. I can of course separate the fields by spaces if needed, or keep them with /
. The files are unordered and generated from a code-base during compile-time via inspection. My desired output is going to be a graphviz DOT file containing each file in its own subgraph.
Thus for the input:
a/b/c
a/b/d
a/B/C
A/B/C
the output would be
digraph {
subgraph cluster_a {
label = a
subgraph cluster_b {
label = b
node_1 [label=c]
node_2 [label=d]
}
subgraph cluster_B {
label = B
node_3 [label=C]
}
}
subgraph cluster_A {
label = A
subgraph cluster_B {
label = B
node_4 [label=C]
}
}
}
Does anybody know how I could get this processing done? I'm open to other tools as well, not just AWK.
NOTE: Depth is not fixed, though I could pre-compute the maximum depth if necessary. Not all leaves will be at the same depth either.
I'm open to other tools as well, not just AWK.
I offer this Python solution:
import sys
INDENT = ' '
NODE_COUNT = 1
def build(node, l):
x = l[0]
if x not in node:
node[x] = {}
if len(l) > 1:
build(node[x], l[1:])
def indent(s, depth):
print('%s%s' % (INDENT * depth, s))
def print_node(label, value, depth):
if len(value.keys()) > 0:
indent('subgraph cluster_%s {' % label, depth)
indent(' label = %s' % label, depth)
for child in value:
print_node(child, value[child], depth+1)
indent('}', depth)
else:
global NODE_COUNT
indent('node_%d [label=%s]' % (NODE_COUNT, label), depth)
NODE_COUNT += 1
def main():
d = {}
for line in sys.stdin:
build(d, [x.strip() for x in line.split()])
print('digraph {')
for k in d.keys():
print_node(k, d[k], 1)
print('}')
if __name__ == '__main__':
main()
Result:
$ cat rels.txt
a b c
a b d
a B C
A B C
$ cat rels.txt | python3 make_rels.py
digraph {
subgraph cluster_a {
label = a
subgraph cluster_b {
label = b
node_1 [label=c]
node_2 [label=d]
}
subgraph cluster_B {
label = B
node_3 [label=C]
}
}
subgraph cluster_A {
label = A
subgraph cluster_B {
label = B
node_4 [label=C]
}
}
}