Using bash to parse the output of ldapsearch

I recently wrote a bash script that had to parse the output of ldapsearch results. The script works, but I imagine there is a more efficient way to accomplish this.

The script executes an ldapsearch command, which outputs multiple records that are in a multiline format. Each record is separated by a blank line. What I ended up doing was the following:

add a delimitating character to the end of each line
Add the string 'DELIM' to blank lines
trimmed all new lines
Replaced 'DELIM' with a new line

What this effectively did was turn the multiline output of ldapsearch to multiple lines of delimited separated values. I then use cut twice to parse the lines (once to split the delimiter, and then again to spit the output of the ldap result)

Here is the code:

while IFS= read -r line ; do
 dn=$(echo "$line" | cut -d '#' -f 1 | cut -d " " -f 2)
 uid=$(echo "$line" | cut -d '#' -f 2 | cut -d " " -f 2)
 uidNumber=$(echo "$line" | cut -d '#' -f 3 | cut -d " " -f 2)
 gidNumber=$(echo "$line" | cut -d '#' -f 4 | cut -d " " -f 2)

 # Code emitted since it's not relevant

done < <(ldapsearch -x -H "$ldap_server" -D 'cn=Directory Manager' -w $ds_password -b "$searchbase" -LLL uid uidNumber gidNumber | sed 's/$/#/g' | sed 's/^#$/DELIM/g' | tr -d '\n' | sed 's/DELIM/\n/g')

The output of the ldapsearch command is the following

dn: uid=userone,ou=People,dc=team,dc=company,dc=local
uid: userone
uidNumber: 5000
gidNumber: 5000

dn: uid=usertwo,ou=People,dc=team,dc=company,dc=local
uid: usertwo
uidNumber: 5001
gidNumber: 5001

Is there a more efficient way to accomplish this? Specifically one that doesn't use piping so extensively?

Solution

Assumptions:

the ldapsearch data does not contain white space(s)
reformatting the data into single lines (via OP's current code or via jotne's answer) includes replacing the # delimiter with a space ( )

Using a space (instead of a #) as the delimiter we have the following reformatted ldapsearch data (8x space-delimited fields):

dn: uid=userone,ou=People,dc=team,dc=company,dc=local uid: userone uidNumber: 5000 gidNumber: 5000
dn: uid=usertwo,ou=People,dc=team,dc=company,dc=local uid: usertwo uidNumber: 5001 gidNumber: 5001

The while read operation can be modified to eliminate the (currently) 12x subprocess calls (4x $(echo|cut|cut)) on each pass through the while loop, eg:

while read -r _ dn _ uid _ uidNumber _ gidNumber
do
    echo "############"
    echo ".$dn."
    echo ".$uid."
    echo ".$uidNumber."
    echo ".$gidNumber."
done < <(ldapsearch ... | other_code_to_reformat_ldapsearch_data_as_single_lines_but_with_space_delimiter)

NOTES:

the _ are dummy place holders for fields we don't care about
periods (.) added to echo statements as visual delimiters

This generates:

############
.uid=userone,ou=People,dc=team,dc=company,dc=local.
.userone.
.5000.
.5000.
############
.uid=usertwo,ou=People,dc=team,dc=company,dc=local.
.usertwo.
.5001.
.5001.

Another awk idea for reformatting the ldapsearch results that outputs just the fields we're interested in:

awk '{for (i=2;i<=NF;i=i+2) {printf (i==2 ? "" : " ") $i}; print ""}' RS= ORS='\n'

Where:

we re-use jotne's RS/ORS settings
(i=2;i<=NF,i=i+2) - only print even numbered fields

This generates:

uid=userone,ou=People,dc=team,dc=company,dc=local userone 5000 5000
uid=usertwo,ou=People,dc=team,dc=company,dc=local usertwo 5001 5001

With this change (4x space-delimited fields instead of 8x space-delimited fields) the proposed while read becomes:

while read -r dn uid uidNumber gidNumber
do
    ....
done < <(ldapsearch ... | awk '{for (i=2;i<=NF;i=i+2) {printf (i==2 ? "" : " ") $i}; print ""}' RS= ORS='\n')