Search code examples
javascriptregexsplitmatch

JavaScript Split String with regex - match (how to replace match regex into split function)


let s = "AbcDefGH"
s.match(/[A-Z]+[a-z]*/g)
["Abc", "Def", "GH"] // This is what expecting with split function
s.split(/(?=[A-Z]+[a-z]*)/g)
["Abc", "Def", "G", "H"] // "G" and "H" are separated.

My Question is how can I replace match regex into split function to get the same result of match.

Please explain what for ?= when match regex is translated to split function

Thanks


Solution

  • You can enable JavaScript's u or v flag to have Unicode and character set features. This way, you can use \p{L} to match any letter in any language. This will be safer than using [a-zA-Z] as it will match accented characters and also non-Latin characters.

    In your case, we want to match between a lowercase and an uppercase letter. So we'll use a positive lookbehind to find a lowercase letter, followed by a positive lookahead to find an uppercase letter:

    • lookbehind: (?<=\p{Ll}), where \p{Ll} will match a lowercase letter in any language, so for example "a", "à" or "ÿ".

    • lookahead: (?=\p{Lu}), where \p{Lu} will match an uppercase letter in any language, so for example "C", "Ç" or "É".

    Here is the detailed list of Unicode categories.

    And a little example of code to illustrate it:

    // Enable the `u` or `v` flag to have Unicode and character set features.
    // \p{L} matches any Unicode letter
    // \p{Ll} matches any Unicode lowercase letter.
    // \p{Lu} matches any Unicode uppercase letter.
    // (?<=\p{Ll}) is a positive lookbehind to find a lowercase letter.
    // (?=\p{Lu}) is a positive lookahead to find an uppercase letter.
    
    const regex = /(?<=\p{Ll})(?=\p{Lu})/gu;
    
    const inputs = [
      "AbcDefGH",
      "J'aiMangéEtBuÀVolonté",
      "IAteAndDrankAsMuchAsIWanted"
    ];
    
    inputs.forEach((input) => {
      console.log('Input = "' + input + '"');
      console.log(input.split(regex));
    });