I am having a csv file, quotes around each field.
there are some fields, which can have multiple double quotes inside them. I want to escape each of them with additional double quote.
","ABC "XYZ" PQRS","
","ABC "XYZ"","
","ABC "A" "B" TEST","
","ABC 2.5" "C" Test","
I took help from the link and able to cover for scenarios with single double quote inside content, using regular expression [regex]$r='(","[^"]+"[^"]+?",")'
. But, stuck up in the cases, where there are multiple double quotes inside content.
[regex]$r='(","[^"]+"[^"]+"",")' # Not working
get-content C:\Projects\MyProject\testRegexFordoublequotes.csv | foreach {
#save each line to a variable to make it easier to track
$line=$_
#look for a regex match
$find=$r.matches($line)
if ($find[0].Success) {
foreach ($match in $find) {
#the original string we matched on
$found=$match.value
#replace the substring
$replace= '","'+ $found.Trim('","').Replace('""','"').Replace('"','""')+ '","'
#replace the full string and write to the pipeline
$line -replace $found,$replace
} #foreach
} #if
else {
#no match so write the line to pipeline
$line
}
} | Set-Content C:\Projects\MyProject\modified.csv -Force
Can you please help me in defining regex which will be helpful for multiple double quotes inside field.
It is probably easier to search for the valid delimiters (e.g. "\s*,\s*"
) and split your lines into fields, and than simply correct each (invalid) single double quote with 2 quotes in each field.
Than rebuild the fields to a record by surrounding the fields with double quotes and join them with the csv
(comma) delimiter
Input
$Csv = @'
"Field","ABC "XYZ" PQRS","Field"
"Field","ABC "XYZ"","Field"
"Field","ABC "A" "B" TEST","Field"
"Field","ABC 2.5" "C" Test","Field"
'@ -Split '[\r\n]+'
Script
$Csv | # replace with: get-content .\testRegexFordoublequotes.csv |
Foreach-Object {
$Line = $_ -Replace '^\s*"' -Replace '"\s*$' # Strip outer double quotes
$Fields = $Line -Split '"\s*,\s*"' # Split line into fields
$Fields = $Fields -Replace '"', '""' # Escape each " in each field
'"' + ($Fields -Join '","') + '"' # Rejoin the fields to line
} # append: | Set-Content .\modified.csv -Force
Output
"Field","ABC ""XYZ"" PQRS","Field"
"Field","ABC ""XYZ""","Field"
"Field","ABC ""A"" ""B"" TEST","Field"
"Field","ABC 2.5"" ""C"" Test","Field"