Search code examples
regexperldouble-quotesregex-lookaroundslookbehind

Perl Regex: How to remove quotes inside quotes from CSV line


I've got a line from a CSV file with " as field encloser and , as field seperator as a string. Sometimes there are " in the data that break the field enclosers. I'm looking for a regex to remove these ".

My string looks like this:

my $csv = qq~"123456","024003","Stuff","","28" stuff with more stuff","2"," 1.99 ","",""~;

I've looked at this but I don't understand how to tell it to only remove quotes that are

  1. not at the beginning of the string
  2. not at the end of the string
  3. not preceded by a ,
  4. not followed by a ,

I managed to tell it to remove 3 and 4 at the same time with this line of code:

$csv =~ s/(?<!,)"(?!,)//g;

However, I cannot fit the ^ and $ in there since the lookahead and lookbehind both do not like being written as (?<!(^|,)).

Is there a way to achieve this only with a regex besides splitting the string up and removing the quote from each element?


Solution

  • This should work:

    $csv =~ s/(?<=[^,])"(?=[^,])//g
    

    1 and 2 implies that there must be at least one character before and after the comma, hence the positive lookarounds. 3 and 4 implies that these characters can be anything but a comma.