Search code examples
javascriptunicodeemoji

How can I split a string containing emoji into an array?


I want to take a string of emoji and do something with the individual characters.

In JavaScript "๐Ÿ˜ด๐Ÿ˜„๐Ÿ˜ƒโ›”๐ŸŽ ๐Ÿš“๐Ÿš‡".length == 13 because "โ›”" length is 1, the rest are 2. So we can't do

const string = "๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง ๐Ÿ‘ฆ๐Ÿพ ๐Ÿ˜ด ๐Ÿ˜„ ๐Ÿ˜ƒ โ›” ๐ŸŽ  ๐Ÿš“ ๐Ÿš‡";

const s = string.split(""); 
console.log(s);

const a = Array.from(string);
console.log(a);


Solution

  • With the upcoming Intl.Segmenter. You can do this:

    const splitEmoji = (string) => [...new Intl.Segmenter().segment(string)].map(x => x.segment)
    
    splitEmoji("๐Ÿ˜ด๐Ÿ˜„๐Ÿ˜ƒโ›”๐ŸŽ ๐Ÿš“๐Ÿš‡") // ['๐Ÿ˜ด', '๐Ÿ˜„', '๐Ÿ˜ƒ', 'โ›”', '๐ŸŽ ', '๐Ÿš“', '๐Ÿš‡']
    

    This also solve the problem with "๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง" and "๐Ÿ‘ฆ๐Ÿพ".

    splitEmoji("๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง๐Ÿ‘ฆ๐Ÿพ") // ['๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง', '๐Ÿ‘ฆ๐Ÿพ']
    

    According to CanIUse, this is supported by all modern browsers.

    If you need to support older browsers, as mentioned in Matt Davies' answer, Graphemer is the best solution:

    let Graphemer = await import("https://cdn.jsdelivr.net/npm/[email protected]/+esm").then(m => m.default.default);
    let splitter = new Graphemer();
    let graphemes = splitter.splitGraphemes("๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง๐Ÿ‘ฆ๐Ÿพ"); // ['๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘งโ€๐Ÿ‘ง', '๐Ÿ‘ฆ๐Ÿพ']