Search code examples
powershellcsvcut

Get 3rd column and put it in another file for all CSV files with Powershell


I have a folder, with subfolders full of csv files. For each file I would like to get only the third column and save it in another file.

Example:

filedata1.csv

aa,bb,cc
cat,horse,"dog, bird"
4th,33,first and second

filedata1column3.csv

cc
dog, bird
first and second

what I tried but it turns out wrong:

param ( [String] $Filename  = '*.csv',
            [String] $SourceDir = 'C:\Data\',
    )
    
ForEach ($csvFile in Get-ChildItem -Path $SourceDir -Filter $Filename){
            (Import-Csv $csvFile)
            cut -d, -f 3 *.csv > *column3.csv

Solution

    • Use the intrinsic psobject property to determine each file's column names from the objects that Import-Csv parses the CSV data into, and extract the third name.

    • Then use member-access enumeration to extract all property values from the rows (objects), relying on PowerShell's ability to use expressions (variable references) as property names.

    • Use Join-Path in conjunction with -f, the format operator to determine a suitable output file path and name, based on the input file name.

    # ... parameter declaration and foreach loop omitted.
    # The following is what needs to go *inside the foreach loop*.
    
    # Import all rows.
    $rows = Import-Csv $csvFile
    # Determine the name of the third column
    $thirdColName = ($rows[0].psobject.Properties.Name)[2]
    
    # Determine a suitable output file path and name.
    # Here, the output file is placed in the same directory as the input file, 
    # with '_colummn3' appended to the latter's base name.
    $outFile =
      Join-Path $csvFile.DirectoryName ('{0}_column3.csv' -f $csvFile.BaseName)
    
    # Output the header column
    $thirdColName > $outFile
    # Append the values of the third column
    $rows.$thirdColName >> $outFile
    

    Note:

    • In Windows PowerShell, > and >> create "Unicode" (UTF-16LE) files, whereas in PowerShell (Core) 7+ they create BOM-less UTF-8 files.

    • If you need a different encoding, use Set-Content and Add-Content with an -Encoding argument, but note that in Windows PowerShell you cannot create BOM-less UTF-8 files that way.

      • If you do need BOM-less UTF-8 files in Windows PowerShell, use the workaround from this answer (use this instead of the > and >> statements, and use "`r`n" instead of "`n" if you need Windows-format CRLF newlines):

        # Write the entire file as BOM-less UTF-8.
        $null =
          New-Item -Force $outFile -Value (
            $thirdColName + "`n" + ($rows.$thirdColName -join "`n") + "`n"
          )
        

    To put it all together in a solution that:

    • parameterizes the inputs further while providing default values
    • creates BOM-less UTF-8 files.

    Place the code in a .ps1 file and invoke the latter (e.g. ./Script.ps1), optionally with arguments that override the defaults.

    param(
      $FileName = '*.csv', # File-name filter
      $SourceDir = $PWD,   # Input-files dir., default to current
      $OutDir = $PWD,      # Output-files dir., default to current
      $ColumnIndex = 3,    # 1-based index of the col. of interest
      # Suffix to append to the input file name's base name 
      # to form the output file name.
      $OutFileBaseNameSuffix = "_column$ColumnIndex"
    )
    
    foreach ($csvFile in Get-ChildItem -LiteralPath $SourceDir -Filter $Filename) {
    
      # If the output files are written to the same dir. as the input files,
      # exclude any output files from a previous run.
      if ($SourceDir -eq $OutDir -and $csvFile.BaseName -like "*$OutFileBaseNameSuffix") { continue }
    
      # Import all rows for the file at hand.
      $rows = Import-Csv $csvFile
      # Determine the name of the column with the index of interest.
      # Subtract -1 from $ColumnIndex, because array indices are *0*-based.
      $colName = ($rows[0].psobject.Properties.Name)[$ColumnIndex-1]
    
      # Determine the output file path and name.
      $outFile =
        Join-Path $OutDir ('{0}{1}.csv' -f $csvFile.BaseName, $OutFileBaseNameSuffix)
    
      # Write the entire file, as BOM-less UTF-8.
      $null =
        New-Item -Force $outFile -Value (
          $colName + "`n" + ($rows.$colName -join "`n") + "`n"
        )
    
    }