Search code examples
javascriptregextokenize

Regex JS: Selecting all spaces not within quotes for multiple quotes in a row


So I need to select all spaces not between quotes and delete them, I am using regex in Javascript notation.

For example:

" Test Test " ab c " Test" "Test " "Test" "T e s t"

becomes

" Test Test "abc" Test""Test ""Test""T e s t"

UPDATE: I am looking for a solution that would work in the above test setting: https://www.regextester.com/

The goal is to effectively tokenize this long sentence by these spaces not included within quotes, but I figured the answer to the above question was more concise and easier to read/answer.

All Spaces not within quotes should be highlighted in the above setting. If they are highlighted in the above setting they would be parsed as follows:

[" Test Test ",ab,c," Test","Test ","Test","T e s t"]

My attempted solution was:

(,|;\s|\s)+(?![^\[]*\])(?![^\(]*\))(?![^\{]*\})(?![^"].*")

Solution

  • You may use this regex for .split:

    \s+(?=(?:(?:[^"]*"){2})*[^"]*$)
    

    This regex will split on spaces if those are outside double quotes by using a lookahead to make sure there are even number of quotes after space.

    RegEx Demo

    const str = '" Test Test " ab c " Test" "Test " "Test" "T e s t"';
    
    var arr = str.split(/\s+(?=(?:(?:[^"]*"){2})*[^"]*$)/);
    
    console.log(arr);
    
    /* OUTPUT
    [
      "\" Test Test \"",
      "ab",
      "c",
      "\" Test\"",
      "\"Test \"",
      "\"Test\"",
      "\"T e s t\""
    ]
    */