Search code examples
regexvimvi

remove all commas between quotes with a vim regex


I've got a CSV file with lines like:

57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M

I need them to look like

57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200

I'm using vim regexes. I've broken it down into 4 steps:

  1. Remove ^M and insert newlines:

    :%s:<ctrl-V><ctrl-M>:\r:g`
    
  2. Replace all with -:

    :%s: :\-:g
    
  3. Remove commas between quotes: Need help here.

  4. Remove quotes:

    :%s:\"\([^"]*\)\":\1:g
    

How do I remove commas between quotes, without removing all commas in the file?

Something like this?

:%s:\("\w\+\),\(\w\+"\):\1 \2:g

Solution

  • My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.

    To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.

    :%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
    

    So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.

    The relevant help page is :help sub-replace-special.


    As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.

    In Step 2 escaping the - in the replacement is unnecessary. So the command is just

    :%s/ /-/g
    

    In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them

    :%s/"//g