Search code examples
powershellpowershell-4.0

Empty CSV file produced when reading 1 row of tab delimited data and exporting to '+' delimited


The objective for a (PowerShell) script is to take a tab delimited text file, remove the header row, change the delimiter to '+', add a custom header record and add a summary (footer) row at the bottom of the file with a count of the number of data records. Lastly, the file extension needs to be replaced with a sequential number.

When the raw file contains more than 1 row the result is as required, but when only one row (Header plus 1 data row), the output file is empty.

$dir = "C:\Temp\Data"
$file = "rand1"

$sequencefile = "C:\temp\Sequential\DoNotDeleteSequence.txt"

$sequencenumber = (Get-Content $sequencefile)
$newsequencenumber = ($sequencenumber/1) + 1

Clear-Content $sequencefile
Add-Content $sequencefile $newsequencenumber

$backslash = "\"
$ext = ".txt"
$filename = $dir + $backslash + $file + $ext
$text = "TRAILER = "
$dateText = Get-Date -Format d
$Header1 = "HEADER="
$Header2 = "+PSTG"
$HeaderText = $Header1 + $dateText + $Header2

$tempfile1 = "step1" 
$tempfile2 = "step2" 
$tempfile3 = "step3" 
$tempfile4 = "step4" 

$temppstg = "PSTG_NCDLPSTG."

$stepfile1 = $dir + $backslash + $tempfile1 + $ext
$stepfile2 = $dir + $backslash + $tempfile2 + $ext
$stepfile3 = $dir + $backslash + $tempfile3 + $ext
$stepfile4 = $dir + $backslash + $tempfile4 + $ext

$pstgfile = $dir + $backslash + $temppstg + $newsequencenumber

(Get-Content $filename).Replace("+", '') | Set-Content $stepfile1
(Get-Content $stepfile1) | select -Skip 1 | Set-Content $stepfile2
Import-Csv $stepfile2 -Delimiter "`t" | Export-Csv $stepfile3 -Delimiter "+" -NoTypeInformation
Set-Content $stepfile4 $HeaderText
(Get-Content $stepfile3).Replace("""", '') | Add-Content $stepfile4
$records = Import-Csv $stepfile4 | Measure-Object | Select-Object -Expand Count

$textToWrite = $text + $records

Add-Content $stepfile4 $textToWrite

Rename-Item $stepfile4 $pstgfile

I've created 2 test raw files (tab delimited).

C:\Temp\Data\rand1.txt containing

Header  Row
Record1 Data1

C:\Temp\Data\rand2.txt containing

Header  Row
Record1 Data1
Record2 Data2

When rand2.txt is passed through the code the result is

HEADER=25/03/2019+PSTG
Record1+Data1
Record2+Data2
TRAILER = 2

When rand1.txt is processed, the result is

HEADER=25/03/2019+PSTG
TRAILER = 0

Solution

  • The following two processing steps are causing the behavior you observed:

    (Get-Content $stepfile1) | select -Skip 1 | Set-Content $stepfile2
    Import-Csv $stepfile2 -Delimiter "`t" | Export-Csv $stepfile3 -Delimiter "+" -NoTypeInformation
    

    To understand what's going on you simply need to take a look a the individual processing steps. I'm showing input and output for files rand1.txt (left) and rand2.txt (right) side-by-side to illustrate what's happening.

    1. (Get-Content $stepfile1) | select -Skip 1 | Set-Content $stepfile2

      Input:

      Header   Row                  Header   Row
      Record1  Data1                Record1  Data1
                                    Record2  Data2
      

      Output:

      Record1  Data1                Record1  Data1
                                    Record2  Data2
      

      This step removes the header line, leaving you with one row for rand1.txt, but two rows for rand2.txt. This in itself is not the root cause of the problem you observed, but it prepares the ground for the actual culprit.

    2. Import-Csv $stepfile2 -Delimiter "`t" | Export-Csv $stepfile3 -Delimiter "+" -NoTypeInformation

      Input:

      Record1  Data1                Record1  Data1
                                    Record2  Data2
      

      Output:

                                    Record1+Data1
                                    Record2+Data2
      

      This step leaves you with no output for rand1.txt, but both records for rand2.txt, because of the way Import-Csv and Export-Csv work.

      Import-Csv converts the rows of a delimited file into custom objects, and the fields of each row into the properties of those objects, taking the names for these properties from the first line of the input file.

      Export-Csv does the polar opposite of its sibling cmdlet: it converts a list of custom objects to rows of data, and the values of the object properties to the fields of the data rows. The names of the properties of the first object in the list determine which properties get exported, and also form the header row of the output file.

      Since rand1.txt has only a single row (i.e. just the header row) no objects are created on import, due to no data, hence there is no data to export back to a file. rand2.txt on the other hand has two rows, so on import you're getting one object with the following structure (in JSON notation):

      {
        "Record1": "Record2",
        "Data1": "Data2"
      }
      

      which is then converted back to a header row Record1+Data1 and a data row Record2+Data2.

    The subsequent steps only add a new header line to the intermediate text files you've created. Had you either not removed the header before using Import-Csv, or not used the *-Csv cmdlets in the first place, this issue would not have occurred.

    With that said, your entire processing is way too convoluted. You can achieve the desired result with just the following lines (some variable definitions omitted for brevity):

    $filename = Join-Path $dir "${file}${ext}"
    
    $records = (Get-Content $filename).Replace('+', '').Replace("`t", '+') |
               Select-Object -Skip 1
    
    $newsequencenumber = [int](Get-Content $sequencefile) + 1
    $pstgfile = Join-Path $dir "PSTG_NCDLPSTG.${newsequencenumber}"
    
    "HEADER=$(Get-Date -Format d)+PSTG" | Set-Content $ptsgfile
    $records | Add-Content $ptsgfile
    "TRAILER = $($records.Count)" | Add-Content $ptsgfile