Search code examples
unixawkldif

Unix repeat line number for each group in file based on patern probabelly with awk


I need help with text file specific record numbering. This is ldif file and it will be about 2GB. Processing machine is unix so I typed awk. While already tried nl and cat it looks more complicated and awk looks good for this task, I'm familiar with sql but this is not close to it :-)

Goal is to guarantee numerical uniqueness for group and elements of group:

  1. Add column with repeating number to each row in data group starting with attribute 'dn:' (it can be repeated row number or counter) important thing is it should be unique among groups.
  2. Add column with incremental number when attribute is repeating.

Input:

dn: uc=an
version: 12

dn: uid=fcb
uid: ljfhsfff
missdata: at12
missdata: at3
fladata: part2
fladata: part3
fladata: part1

dn: uid=fccb
uid: kjhfa8
missdata: at1
missdata: at8
missdata: at10
missdata: at14
fladata:: a06b6a==
fladata: part3
att: dsc

Output(one of possible):

1 1 dn: uc=an
1 1 version: 12

2 1 dn: uid=fcb
2 1 uid: ljfhsfff
2 1 missdata: at12
2 2 missdata: at3
2 1 fladata: part2
2 2 fladata: part3
2 3 fladata: part1

3 1 dn: uid=fccb
3 1 uid: kjhfa8
3 1 missdata: at1
3 2 missdata: at8
3 3 missdata: at10
3 4 missdata: at14
3 1 fladata:: a06b6a==
3 2 fladata: part3
3 1 att: dsc

Solution

  • $ awk -F':' '{if (NF) {$0 = (grpNr+1) OFS (++eltCnt[$1]) OFS $0} else {++grpNr; delete eltCnt}} 1' file
    1 1 dn: uc=an
    1 1 version: 12
    
    2 1 dn: uid=fcb
    2 1 uid: ljfhsfff
    2 1 missdata: at12
    2 2 missdata: at3
    2 1 fladata: part2
    2 2 fladata: part3
    2 3 fladata: part1
    
    3 1 dn: uid=fccb
    3 1 uid: kjhfa8
    3 1 missdata: at1
    3 2 missdata: at8
    3 3 missdata: at10
    3 4 missdata: at14
    3 1 fladata:: a06b6a==
    3 2 fladata: part3
    3 1 att: dsc