let s = "AbcDefGH"
s.match(/[A-Z]+[a-z]*/g)
["Abc", "Def", "GH"] // This is what expecting with split function
s.split(/(?=[A-Z]+[a-z]*)/g)
["Abc", "Def", "G", "H"] // "G" and "H" are separated.
My Question is how can I replace match regex into split function to get the same result of match.
Please explain what for ?=
when match regex is translated to split function
Thanks
You can enable JavaScript's u
or v
flag to have Unicode and
character set features. This way, you can use \p{L}
to match any
letter in any language. This will be safer than using [a-zA-Z]
as
it will match accented characters and also non-Latin characters.
In your case, we want to match between a lowercase and an uppercase letter. So we'll use a positive lookbehind to find a lowercase letter, followed by a positive lookahead to find an uppercase letter:
lookbehind: (?<=\p{Ll})
, where \p{Ll}
will match a lowercase letter
in any language, so for example "a", "à" or "ÿ".
lookahead: (?=\p{Lu})
, where \p{Lu}
will match an uppercase letter
in any language, so for example "C", "Ç" or "É".
Here is the detailed list of Unicode categories.
And a little example of code to illustrate it:
// Enable the `u` or `v` flag to have Unicode and character set features.
// \p{L} matches any Unicode letter
// \p{Ll} matches any Unicode lowercase letter.
// \p{Lu} matches any Unicode uppercase letter.
// (?<=\p{Ll}) is a positive lookbehind to find a lowercase letter.
// (?=\p{Lu}) is a positive lookahead to find an uppercase letter.
const regex = /(?<=\p{Ll})(?=\p{Lu})/gu;
const inputs = [
"AbcDefGH",
"J'aiMangéEtBuÀVolonté",
"IAteAndDrankAsMuchAsIWanted"
];
inputs.forEach((input) => {
console.log('Input = "' + input + '"');
console.log(input.split(regex));
});