Search code examples
powershellpowershell-5.0

Extract multiple line text file to single line csv using Powershell


I've a task and don't have an easy way to parse some data into correct format. The text file i have is in following format

#N Last Name: Joe
#D First Name: Doe
#P Middle Name: A
Some Data:
#C ID Number: (1) 12345
#S Status: (1) Active 

#N Last Name: Jane
#D First Name: Doee
#P Middle Name: 
Some Data:
#C ID Number: (1) 11111
#S Status: (1) Active 
ID Number: (2) 1231
Status: (2) Active

Here is the code i was trying to using.

$A = Select-String -Pattern "#N" MYFILE.txt;
$B = Select-String -Pattern "#D" MYFILE.txt;
$C = Select-String -Pattern "#P" MYFILE.txt;
$D = Select-String -Pattern "#C" MYFILE.txt;
$E = Select-String -Pattern "#S" MYFILE.txt;

$wrapper = New-Object PSObject -Property @{ FirstColumn = $A; SecondColumn = $B; ThirdColumn = $C; FourthColumn = $D; FifthColumn = $E }
Export-Csv -InputObject $wrapper -Path .\output.csv -NoTypeInformation

This is the result i'm getting

"SecondColumn","ThirdColumn","FifthColumn","FourthColumn","FirstColumn"
"System.Object[]","System.Object[]","System.Object[]","System.Object[]","System.Object[]"

The output i'm looking for is; #N,#D,#P,#C,#S

 Joe, Doe, A, 12345, Active
 Jane, Doee, , 11111, Active

Any and all help is really appreciated.


Solution

  • here's another way to parse that block of data. i changed the user info to make it more obvious what was going on. [grin]

    what it does ...

    • creates a single multiline string to work with
      when ready to do this for real, replace the entire #region/#endregion block with a call to Get-Content -Raw.
    • defines the delimiter between blocks of user data
      in this case it is 2 newlines - one at the end of the last data line and one for the blank line.
    • splits the single multiline string into multiple such strings
    • iterates thru the resulting blocks of text
    • initializes the $Vars used to build the PSCO
    • splits the text blocks into lines of text
    • filters out any line that does NOT start with a #, then a letter, and a final space
    • iterates thru the remaining lines
    • runs a switch on the 2nd char in each line
    • when it matches one of the code letters, parse the line & set the value for the equivalent $Var
    • finishes iterating thru the current set of strings
    • builds a [PSCustomObject] to hold the values
    • sends that out to the $Result collection
    • finishes iterating thru the text blocks [the outer foreach]
    • displays the collection on screen
    • saves the collection to a CSV

    if you want to remove the quotes from the CSV, please don't. [grin] if you MUST risk wrecking the CSV, then you can use Get-Content to load the lines from the file and replace the quotes with nothing.

    the code ...

    #region >>> fake reading in a text file as a single multiline string
    #    in real life, use "Get-Content -Raw"
    $InStuff = @'
    #N Last Name: ALast
    #D First Name: AFirst
    #P Middle Name: AMid
    Some Data:
    #C ID Number: (1) 11111
    #S Status: (1) Active 
    
    #N Last Name: BLast
    #D First Name: BFirst
    #P Middle Name: 
    Some Data:
    #C ID Number: (1) 22222
    #S Status: (1) Active 
    ID Number: (2) 1231
    Status: (2) Active
    '@
    #endregion >>> fake reading in a text file as a single multiline string
    
    $BlockDelim = ([System.Environment]::NewLine) * 2
    
    $Result = foreach ($Block in ($InStuff -split $BlockDelim))
        {
        # initialize stuff to $Null
        #    this handles non-matches [such as a missing middle name] 
        $FirstName = $MidName = $LastName = $IdNumber = $Status = $Null
    
        # the "-match" filters for lines that start with a "#", a single letter, and a space
        foreach ($Line in ($Block -split [System.Environment]::NewLine -match '^#\w '))
            {
            switch ($Line[1])
                {
                'N' {
                    $LastName = $Line.Split(':')[-1].Trim()
                    break
                    }
                'D' {
                    $FirstName = $Line.Split(':')[-1].Trim()
                    break
                    }
                'P' {
                    $MidName = $Line.Split(':')[-1].Trim()
                    break
                    }
                'C' {
                    $IdNumber = $Line.Split(':')[-1].Trim().Split(' ')[-1].Trim()
                    break
                    }
                'S' {
                    $Status = $Line.Split(':')[-1].Trim().Split(' ')[-1].Trim()
                    break
                    }
                } # end >>> switch ($Line[1])
            } # end >>> foreach ($Line in ($Block -split [System.Environment]::NewLine))
    
            # create a custom object and send it out to the collection
            [PSCustomObject]@{
                FirstName = $FirstName
                LastName = $LastName
                MidName = $MidName
                IdNumber = $IdNumber
                Status = $Status
                }
        } # end >>> foreach ($Block in ($InStuff -split $BlockDelim))
    
    # display on screen
    $Result
    
    # send to a CSV file
    $Result |
        Export-Csv -LiteralPath "$env:TEMP\Veebster_ParsedResult.csv" -NoTypeInformation
    

    the on-screen output ...

    FirstName : AFirst
    LastName  : ALast
    MidName   : AMid
    IdNumber  : 11111
    Status    : Active
    
    FirstName : BFirst
    LastName  : BLast
    MidName   : 
    IdNumber  : 22222
    Status    : Active
    

    the content of the CSV file ...

    "FirstName","LastName","MidName","IdNumber","Status"
    "AFirst","ALast","AMid","11111","Active"
    "BFirst","BLast","","22222","Active"
    

    please note that there is no error detection OR error handling. [grin]