Search code examples
excelregexcsvnotepad++

Regex: Replace every Comma with Tab Not within quotes


I have a huge data set of entries like these:

(21, 2, '23.5R25 ETADT', 'description, with a comma'),
(22, 1, '26.5R25 ETADT', 'Description without a comma'),
(23, 5, '20.5R20.5', 'Another description with ; semicolumn'),

I'm trying to replace every comma in the list with a tab. Excluding the commas within the single quotes. Also excluding the ending commas.

So the examples entries should become:

(21[TAB]2[TAB]'23.5R25 ETADT'[TAB]'description, with a comma'),
(22[TAB]1[TAB]'26.5R25 ETADT'[TAB]'Description without a comma'),
(23[TAB]5[TAB]'20.5R20.5'[TAB]'Another description with ; semicolumn'),

I've got like 6000 rows of data like this. The tabs allow me to tell Excel to import the elements of these entries into different columns.

The Regex I've tried was this: [ ]*,[ ]* But this Regex selects all the commas, even the ones within the single quotes.


Solution

  • It looks as though each of your lines has 4 elements within parenthesis. And it looks like only the last 2 elements use single quotes. If those assumptions can be made, I've tested the following in Notepad++:

    • "Find what :" ^\(([^,]*),\s*([^,]*),\s*'([^']*)'\s*,\s*
    • "Replace with :" \(\1\t\2\t'\3'\t

    EDIT:

    The search regex is dependent upon the 4 column model with only the last two elements having single quotes. Visually this is how it works:

    Regular expression visualization

    1. ^\(: Finds an opening parenthesis
    2. ([^,]*): Captures non-comma characters which will be all of element 1
    3. ,\s*: Matches a comma and any trailing spaces
    4. ([^,]*): Captures non-comma characters which will be all of element 2
    5. ,\s*: Matches a comma and any trailing spaces
    6. '([^']*)': Captures the string in single quotes which will be all of element 3
    7. \s*,\s*: Matches a comma and all surrounding spaces
    8. Ignore the rest of the string, there are no more commas to be replaced we just want to replace parts of the line we just read in