Search code examples
javascriptregexregex-lookaroundssplitlookbehind

Split string in JavaScript using regex with zero width lookbehind


I know JavaScript regular expressions have native lookaheads but not lookbehinds.

I want to split a string at points either beginning with any member of one set of characters or ending with any member of another set of characters.

Split before , , , , . Split after .

In: ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ

Out: ເລື້ອຍໆມະ ຫັດສະ ຈັນ ເອກອັກຄະ ລັດຖະ ທູດ

I can achieve the "split before" part using zero-width lookahead:

'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ'.split(/(?=[ໃໄໂເແ])/)

["ເລື້ອຍໆມະຫັດສະຈັນ", "ເອກອັກຄະລັດຖະທູດ"]

But I can't think of a general approach to simulating zero-width lookbehind

I'm splitting strings of arbitrary Unicode text so don't want to substitute in special markers in a first pass, since I can't guarantee the absence of any string from my input.


Solution

  • Instead of spliting, you may consider using the match() method.

    var s = 'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ',
        r = s.match(/(?:(?!ະ).)+?(?:ະ|(?=[ໃໄໂເແ]|$))/g);
    
    console.log(r); //=> [ 'ເລື້ອຍໆມະ', 'ຫັດສະ', 'ຈັນ', 'ເອກອັກຄະ', 'ລັດຖະ', 'ທູດ' ]