I'm working my way around the Regex and I'm tasked to write a script to extract specific text between a search pattern on a LDIF and I'm running into some issues. The LDIF we have from is in OpenLDAP format so the file we have is
dn: cn=user1,ou=department,o=company,c=root
changetype: add
givenName: Givenname1
sn: SN1
Country: Cn1
userCertificate;binary:: lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3
City: City1
dn: cn=user3,ou=department3,o=company,c=root
changetype: add
givenName: Givenname3
sn: SN3
customdn: cn=user3,ou=department3,o=company,c=root
userCertificate;binary:: lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3
Country: Cn3
City: City3
dn: cn=user2,ou=department,o=company,c=root
changetype: add
givenName: Givenname2
sn: SN2
customdn: cn=user2,ou=department,o=company,c=root
userCertificate;binary:: lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3lowhjsefnasdvonidfb8943th54ebghyLHFUn9894y9bKalkbjsf
89ehgvpnoLNGPOVNnl;aiorgpnsg;n\vbubGB*gpbeoabgpiobrgaragop08hgnaoergn9r0agnh
U0hBMjU2MB4XDTE5MDYwNTA3
Country: Cn1
City: City1
The file lines are seperated by Line breaks (CRLF). So, I'm trying to extract the text only for User3 with the following pattern which seem to give me blank file.
$RegexPattern = "`r`ndn: cn=User3(.*?)`r`n`r`n"
$result = [regex]::match($inputfile,$RegexPattern).Groups[1].Value
if I change the capture pattern from to (.*), I get all text from after the first match. Pretty sure I'm missing something but just not able to see what is that I'm missing. Can someone kindly help?
Edit: Adding some additional info about custom DN too. The reason for including a CRLF in the search string for dn: is because there is also a custom DN on the user object which is a duplicate of the dn attribute for backward compatiblities. I've updated the example LDIF entry above with this attribute. Edit 2: Wiktor's regex nearly works until the code stumbles upon the userCertificate attribute which where multiple attributes are split only with a LF than a CR+LF seen everywhere else.
First of all, make sure you read the whole file into a variable:
$inputfile = Get-Content .\input.ldif -Raw
Then, you need a regex like
$RegexPattern = '(?mi)^dn: cn=User3[^\r\n]*(?:\r?\n[^\r\n]+)*'
$result = [regex]::match($inputfile,$RegexPattern).Value
See the regex demo
Details
(?mi)
- case insensitive matching ON and the multiline behavior is ON, too^
- start of a linedn: cn=User3
- a literal text[^\r\n]*
- 0+ chars other than CR and LF(?:\r?\n[^\r\n]+)*
- 0+ occurrences of CRLF/LF and then 1+ chars other than CR and LF (so, any non-empty lines below the string above).