I know JavaScript regular expressions have native lookaheads but not lookbehinds.
I want to split a string at points either beginning with any member of one set of characters or ending with any member of another set of characters.
Split before ເ
, ແ
, ໂ
, ໃ
, ໄ
. Split after ະ
.
In: ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ
Out: ເລື້ອຍໆມະ ຫັດສະ ຈັນ ເອກອັກຄະ ລັດຖະ ທູດ
I can achieve the "split before" part using zero-width lookahead:
'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ'.split(/(?=[ໃໄໂເແ])/)
["ເລື້ອຍໆມະຫັດສະຈັນ", "ເອກອັກຄະລັດຖະທູດ"]
But I can't think of a general approach to simulating zero-width lookbehind
I'm splitting strings of arbitrary Unicode text so don't want to substitute in special markers in a first pass, since I can't guarantee the absence of any string from my input.
Instead of split
ing, you may consider using the match()
method.
var s = 'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ',
r = s.match(/(?:(?!ະ).)+?(?:ະ|(?=[ໃໄໂເແ]|$))/g);
console.log(r); //=> [ 'ເລື້ອຍໆມະ', 'ຫັດສະ', 'ຈັນ', 'ເອກອັກຄະ', 'ລັດຖະ', 'ທູດ' ]