I have a huge text file of the following format. I want to manipulate this file to fetch the number of occurrence of the department field. Each section has a field called department:
As a result of my program, I need a CSV file of as mentioned in the Expected output
section. I appreciate if the solution uses sed or head/tail or awk. The file is really huge. I have about 50,000+ lines of code. So an effective method is much appreciated.
Input format:
# Person1 Perosn2, AADDC Users, dummydata.somecompany.com
dn: CN=Person1 Perosn2,OU=AADDC Users,DC=dummydata,DC=somecompany,DC=com
objectClass: top
department: 234ABC
name: Person1 Perosn2
objectGUID:: MbCDVZpKbEWRxDUA5iN5IA==
userPrincipalName: [email protected]
objectCategory: CN=Person,CN=Schema,CN=Configuration,DC=dummydata,DC=somecompany
,DC=com
dSCorePropagationData: 16010101000000.0Z
lastLogonTimestamp: 132173602593105876
preferredLanguage: en-US
msDS-AzureADMailNickname: abcdef
# Person1 Perosn2, AADDC Users, dummydata.somecompany.com
dn: CN=Person1 Perosn2,OU=AADDC Users,DC=dummydata,DC=somecompany,DC=com
objectClass: top
department: 234ABC
name: Person1 Perosn2
objectGUID:: MbCDVZpKbEWRxDUA5iN5IA==
userPrincipalName: [email protected]
objectCategory: CN=Person,CN=Schema,CN=Configuration,DC=dummydata,DC=somecompany
,DC=com
dSCorePropagationData: 16010101000000.0Z
lastLogonTimestamp: 132173602593105876
preferredLanguage: en-US
msDS-AzureADMailNickname: abcdef
# Person3 Perosn4, AADDC Users, dummydata.somecompany.com
dn: CN=Person1 Perosn2,OU=AADDC Users,DC=dummydata,DC=somecompany,DC=com
objectClass: top
department: XYZ012
name: Person1 Perosn2
objectGUID:: MbCDVZpKbEWRxDUA5iN5IA==
userPrincipalName: [email protected]
objectCategory: CN=Person,CN=Schema,CN=Configuration,DC=dummydata,DC=somecompany
,DC=com
dSCorePropagationData: 16010101000000.0Z
lastLogonTimestamp: 132173602593105876
preferredLanguage: en-US
msDS-AzureADMailNickname: abcdef
Expected output
234ABC,2
XYZ012,1
what I did:
I used this command to grep the file.
grep '^department: *' file.txt
But I am not sure if there is a way to get the expected output using single commands like sed, grep etc.
Could you please try following.
awk '
BEGIN{
OFS=","
}
{
gsub(/\r/,"")
}
/department:/{
string=$NF
sub(/ +$/,"",string)
if(!a[string]++){
b[++count]=string
}
++val[string]
}
END{
for(i=1;i<=count;i++){
print b[i],val[b[i]]
}
}
' Input_file