Search code examples
stringpowershelltextsplit

Split Text File based on Strings


I want to split a text file into several files by determining the beginning and end of each file by a specific string. The beginning of the first file is identifiable by the line "<ca>", the end by "</ca>". Now I want to cut and paste the content in between those two strings in a new text file. Until now I've written this code:

$content = Get-Content .\*.txt
{
    if ($f -eq "</ca>") { $c > .\file.txt; }
    if ($f -ne "<ca>" -and $f -ne "</ca>") { $c += $f }
}

The second "if" is supposed to delete the "identification strings" from the created file.

I ran into two issues:

  • I can only select the end of the text
  • all paragraphs from the source file aren't there anymore, the new file consists of just one line with everything in it

The file is a VPN-Configuration and looks like this:

client
dev tun
proto udp
remote 448
verify-x509-name
<ca>
Certificate:
Data:
    Version: 3 (0x2)
    Signature Algorithm: md5WithRSAEncryption
    Issuer: C=de
    -----BEGIN CERTIFICATE-----
MIICzDCCAjWgAwIBAgIJANfh65DfDF45GFSD
    -----END CERTIFICATE-----
</ca>
<cert>  
Certificate:
    Data:
        Version: 3 (0x2)
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=de
</cert>
<key>
-----BEGIN RSA PRIVATE KEY-----
AoGBAN/jBWwRnjNtxJ+bj3U5oKhYjfu33N2dGlM9x5un9YLm9k6pBzhvG
</key>

The output looks like that:

clientdev tunproto udpremote 448verify-x509-name<ca>Certificate:...

(and so on)


Solution

  • You're better off doing this with a multiline regex.

    Get-Content .\vpnconfig.txt -Raw | Select-String '(?sm)<ca>(.+)</ca>' | Select -Expand Matches | Select -First 1 -Expand Value
    

    Makes sure you use -Raw when using Regex like this.