Search code examples
powershellcsvdigit-separator

Remove commas from numbers in a CSV


I have folder info for all user folders. It is dumped out to a CSV file as follows:

Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29

We are unable to work with the data as is due to the thousands separator in the 3rd column. I could run the report scripts again, but we have a lot of file servers and a large number of users on one in particular, so running it again is very time consuming. The reason the commas are there is that the data was written as a string not a number.

I can import and convert, the only problem is that any number over 1000 will be wrong and then all other data is 1 column off. I would like to replace any comma between 2 numbers. It doesn't seem it would be that hard to do with PowerShell, but I am not having any luck finding anything.


Solution

  • If you assume that columns of data are comma plus space separated and your numbers have no spaces, you can use the -replace operator for this.

    $line = 'Servername, F:\Users\user, 9,355.7602 MB, 264, 3054, 03/15/2000 13:28:48, 12/10/2018 11:58:29'
    $line -replace '(?<=\d),(?=\d)'
    

    If you are reading the data from a file, you can read the data with Get-Content, replace your data, and update the file with Set-Content.

    (Get-Content file.csv) -replace '(?<=\d),(?=\d)' | Set-Content file.csv
    

    If the file is large, you can utilize the faster switch statement.

    $data = switch -regex -file file.csv {
              '(?<=\d),(?=\d)' { $_ -replace '(?<=\d),(?=\d)' }
              default {$_}
            }
    $data | Set-Content file.csv
    

    Explanation:

    • (?<=\d) uses a positive lookbehind assertion (?<=) that matches a single digit \d.
    • (?=\d) uses a positive lookahead assertion (?=) that matches a single digit. You could replace this with (?=\d{3}) to match 3 consecutive digits after the comma.
    • Since you want to replace the target comma with empty string, you do not need a replacement string.

    Typically, it would be best to stick with commands that work with CSV data or files. However, if your data contains commas and you aren't qualifying your text, it may be difficult to distinguish between data and delimiters. If you have a clear way of making that distinction, you are better off using ConvertFrom-Csv for already read data or Import-Csv for files. You will need to define headers either in the files or in the command.