Search code examples
javascriptregexsplitnegative-lookbehind

Javascript regex that matches dot as punctuation only, no numbers


I'm trying to write a regex to split a sentence into words. My first thought was to split on spaces or punctuations then I realized that I don't want to split a number with decimals like "0.5" and I don't want to split a timestamp like "14:55:02". So, I tried to fix my problem with negative lookahead and negative lookbehind, but I'm not able to put them together... And lookbehind doesn't seem to be supported in javascript.

My best try so far:

var query = "I've been 0.5 hit at 21:05. I'm okay.";
var delimiter = /[\s\.,:;?!+=\/\\]+(?![0-9])/g;

if(delimiter.test(query)){

    var words = query.split(delimiter);
    console.log(words);

    // ["I've", "been 0.5", "hit", "at 21:05", "I'm", "okay", ""]
}

JSFiddle

So basically, I need a regex that will split my query on [\s\.,:;?!+=\/\\]+ but don't split if [\.,:/] is wrapped by numbers. Please help!


Solution

  • Here's my take on it:

    [\s,;?!+=/\\]+|[.:](?!\d)\s*
    

    Regex101
    Fiddle

    Basically I've split the two cases, and made the lookahead only apply after . or :.

    And yes, JS doesn't support lookbehinds, unfortunately.

    For the more troublesome I love pizza.2 more pizzas please! case, you'd need to switch to matching instead of splitting:

    (?:\d[.:]\d|[^\s.:,;?!+=/\\])+
    

    This won't count a . or : as a separator if it's between two digits.

    Regex101

    And in JS:

    var query = "I've been 0.5 hit at 21:05. I'm okay. I love pizza.2 more pizzas please!" ;
    var re = /(?:\d[.:]\d|[^\s.:,;?!+=\/\\])+/g;
    var words = [];
    var match;
    
    while (match = re.exec(query))
        words.push(match[0]);
            
    for (i in words)
        document.getElementById("demo").innerHTML += words[i] + "<br>";
    <div id="demo"></div>