Search code examples
powershellcsvget-childitemimport-csvforeach-object

PowerShell script to combine CSVs and add root directory as an additional column


I'm trying to write my first PowerShell script and I'm struggling to get my head around the object orientation and piping approach. So I've got a directory structure like this (Google Takeout Nest directories):

Path\DeviceID-A\
   - Year\
      - Month\
         - year-month-sensors-a.csv
Path\DeviceID-B\
   - Year\
      - Month\
         - year-month-sensors-b.csv
...

I want to aggregate all the CSVs together and I've written a simple script that does this:

Get-ChildItem -Recurse -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv merged.csv -NoTypeInformation -Append

However, I lose the context of the DeviceID when everything is aggregated together. So I'd like to add an additional column that contains those initial root directories, with something similar to:

Import-Csv merged.csv | Select-Object *,@{Name='nest_id';Expression={$nestDeviceID}} | Export-Csv merged-v2.csv -NoTypeInformation

So each row in year-month-sensors-a.csv would have an additional column called nest_id with the value DeviceID-A for any CSV under that root directory.

...but I can't quite reconcile the two scripts on how to programmatically iterate through them, whether I need to use ForEach-Object or piping.

Any advice would be appreciated!


Solution

  • Before posting the code, I think it's worth mentioning, this should work as long as all CSVs have always the same headers / columns. If this was not the case something like this could happen:

    @'
    Col1,Col2
    Val1,Val2
    '@ |
    ConvertFrom-Csv |
    Export-Csv ./test.csv -NoTypeInformation
    
    @'
    Col1,NewCol
    Val1,NewVal
    '@ |
    ConvertFrom-Csv |
    Export-Csv ./test.csv -NoTypeInformation -Append
    

    Doing this would throw the following Exception:

    Cannot append CSV content to the following file: ./test.csv. The appended object does not have a property that corresponds to the following column: Col2. To continue with mismatched properties, add the -Force parameter, and then retry the command.


    To answer your question, you're quite close however, I don't think this is possible as a one liner and if it was, I would certainly not recommend it (others reading your code will thank you for that).

    By one liner I mean, pipping Get-ChildItem to Import-Csv without using a loop (ForEach-Object or foreach). This is where you lose reference of DeviceID folder name.

    Points to clarify:

    • The code will first store the $mergedCsv in memory, appending (-Append) to a Csv on each loop iteration when it's not needed will cause a big amount of unnecessary Disk I/O, hence, not efficient. It might not be important for you in this case, but is something good to know for your future scripts.
    • $file.Directory.Parent.Parent.Name should give you the DeviceID folders name, please correct if I'm wrong.
    • Each loop iteration will be importing each CSV and appending the new Nest_ID column which contains the DeviceID folder name.
    $mergedCsv = foreach($file in Get-ChildItem -Recurse -Filter *.csv)
    {
        $nestDeviceID = $file.Directory.Parent.Parent.Name
        $csv = Import-Csv $file.FullName
        $csv | Select-Object *, @{
            Name = 'Nest_ID'
            Expression = { $nestDeviceID }
        }
    }
    
    # Inspect the Merged CSV
    $mergedCsv | Out-GridView
    
    # Export it
    $mergedCsv | Export-Csv merged.csv -NoTypeInformation