Search code examples
regexpowershellcsvstr-replacedouble-quotes

Having multiple double quotes inside quoted string csv file


I am having a csv file, quotes around each field.

there are some fields, which can have multiple double quotes inside them. I want to escape each of them with additional double quote.

","ABC "XYZ" PQRS","
","ABC "XYZ"","
","ABC "A" "B" TEST","
","ABC 2.5" "C" Test","

I took help from the link and able to cover for scenarios with single double quote inside content, using regular expression [regex]$r='(","[^"]+"[^"]+?",")'. But, stuck up in the cases, where there are multiple double quotes inside content.

[regex]$r='(","[^"]+"[^"]+"",")' # Not working
get-content C:\Projects\MyProject\testRegexFordoublequotes.csv | foreach {

  #save each line to a variable to make it easier to track

  $line=$_

  #look for a regex match

  $find=$r.matches($line)
  
  if ($find[0].Success) { 

      foreach ($match in $find) {

        #the original string we matched on

        $found=$match.value

        #replace the substring

        $replace= '","'+  $found.Trim('","').Replace('""','"').Replace('"','""')+ '","'

        #replace the full string and write to the pipeline

        $line -replace $found,$replace

      } #foreach
       

  } #if

  else {

        #no match so write the line to pipeline

        $line

    }

 } | Set-Content C:\Projects\MyProject\modified.csv -Force

Can you please help me in defining regex which will be helpful for multiple double quotes inside field.


Solution

  • It is probably easier to search for the valid delimiters (e.g. "\s*,\s*") and split your lines into fields, and than simply correct each (invalid) single double quote with 2 quotes in each field.
    Than rebuild the fields to a record by surrounding the fields with double quotes and join them with the csv (comma) delimiter

    Input

    $Csv = @'
    "Field","ABC "XYZ" PQRS","Field"
    "Field","ABC "XYZ"","Field"
    "Field","ABC "A" "B" TEST","Field"
    "Field","ABC 2.5" "C" Test","Field"
    '@ -Split '[\r\n]+'
    

    Script

    $Csv | # replace with: get-content .\testRegexFordoublequotes.csv |
    Foreach-Object {
        $Line = $_ -Replace '^\s*"' -Replace '"\s*$' # Strip outer double quotes
        $Fields = $Line -Split '"\s*,\s*"'           # Split line into fields
        $Fields = $Fields -Replace '"', '""'         # Escape each " in each field
        '"' + ($Fields -Join '","') + '"'            # Rejoin the fields to line
    } # append: | Set-Content .\modified.csv -Force
    

    Output

    "Field","ABC ""XYZ"" PQRS","Field"
    "Field","ABC ""XYZ""","Field"
    "Field","ABC ""A"" ""B"" TEST","Field"
    "Field","ABC 2.5"" ""C"" Test","Field"