Search code examples
powershellheaderrenameimport-csv

Powershell: Import-csv, rename all headers


In our company there are many users and many applications with restricted access and database with evidence of those accessess. I don´t have access to that database, but what I do have is automatically generated (once a day) csv file with all accessess of all my users. I want them to have a chance to check their access situation so i am writing a simple powershell script for this purpose.

CSV:

user;database1_dat;database2_dat;database3_dat
john;0;0;1
peter;1;0;1

I can do:

import-csv foo.csv | where {$_.user -eq $user}

But this will show me original ugly headres (with "_dat" suffix). Can I delete last four characters from every header which ends with "_dat", when i can´t predict how many headers will be there tomorrow?

I am aware of calculated property like:

Select-Object @{ expression={$_.database1_dat}; label='database1' }

but i have to know all column names for that, as far as I know.

Am I convicted to "overingeneer" it by separate function and build whole "calculated property expression" from scratch dynamically or is there a simple way i am missing?

Thanks :-)


Solution

  • Assuming that file foo.csv fits into memory as a whole, the following solution performs well:

    $headerRow, $dataRows = (Get-Content -Raw foo.csv) -split '\r?\n', 2
    
    # You can pipe the result to `where {$_.user -eq $user}`
    ConvertFrom-Csv ($headerRow -replace '_dat(?=;|$)'), $dataRows -Delimiter ';' 
    
    • Get-Content -Raw reads the entire file into memory, which is much faster than reading it line by line (the default).

    • -split '\r?\n', 2 splits the resulting multi-line string into two: the header line and all remaining lines.

      • Regex \r?\n matches a newline (both a CRLF (\r\n) and a LF-only newline (\n))
      • , 2 limits the number of tokens to return to 2, meaning that splitting stops once the 1st token (the header row) has been found, and the remainder of the input string (comprising all data rows) is returned as-is as the last token.
      • Note the $null as the first target variable in the multi-assignment, which is used to discard the empty token that results from the separator regex matching at the very start of the string.
    • $headerRow -replace '_dat(?=;|$)'

      • -replace '_dat(?=;|$)' uses a regex to remove any _dat column-name suffixes (followed by a ; or the end of the string); if substring _dat only ever occurs as a name suffix (not also inside names), you can simplify to -replace '_dat'
    • ConvertFrom-Csv directly accepts arrays of strings, so the cleaned-up header row and the string with all data rows can be passed as-is.


    Alternative solution: algorithmic renaming of an object's properties:

    Note: This solution is slow, but may be an option if you only extract a few objects from the CSV file.

    As you note in the question, use of Select-Object with calculated properties is not an option in your case, because you neither know the column names nor their number in advance.

    However, you can use a ForEach-Object command in which you use .psobject.Properties, an intrinsic member, for reflection on the input objects:

    Import-Csv -Delimiter ';' foo.csv | where { $_.user -eq $user } | ForEach-Object {
      # Initialize an aux. ordered hashtable to store the renamed
      # property name-value pairs.
      $renamedProperties = [ordered] @{}
      # Process all properties of the input object and
      # add them with cleaned-up names to the hashtable.
      foreach ($prop in $_.psobject.Properties) {
        $renamedProperties[($prop.Name -replace '_dat(?=.|$)')] = $prop.Value
      }
      # Convert the aux. hashtable to a custom object and output it.
      [pscustomobject] $renamedProperties
    }