Search code examples
javascriptregexemoji

How to remove all but emojis in a (Javascript) string?


I'm trying multiple RegEx expressions but I cannot get them to work.

I have a simple input where the users can type whatever they like, only that the final result must contain emojis. To achieve this, I have to remove every character from the string which is not an emoji, and then check if the length is >= 1.

So basically this: asf..?23kj😔gasdf..😅,fwe34 should become this: 😔😅. Then I'd check the length to confirm that it's >=1 and I'd be good to go.


Solution

  • From what I get from comments, some of this may or may not help -



    To validate a string contains 1 or more emoji :

     # ^(?=[\S\s]*(?:\ud83d[\ude00-\ude4f]))
    
     ^ 
     (?=
          [\S\s]* 
          (?: \ud83d [\ude00-\ude4f] )
     )
    


    To remove only emoji, leaving the rest (global) :

    Find: (?:\ud83d[\ude00-\ude4f])*((?:(?!\ud83d[\ude00-\ude4f])[\S\s])+)(?:\ud83d[\ude00-\ude4f])*
    Replace: $1

     (?: \ud83d [\ude00-\ude4f] )*
     (                                       # (1 start)
          (?:
               (?! \ud83d [\ude00-\ude4f] )
               [\S\s] 
          )+
     )                                       # (1 end)
     (?: \ud83d [\ude00-\ude4f] )*
    


    To remove everything but emoji (global) :

    Find: ((?:\ud83d[\ude00-\ude4f])*)(?:(?!\ud83d[\ude00-\ude4f])[\S\s])+((?:\ud83d[\ude00-\ude4f])*)
    Replace: $1$2

     (                                       # (1 start)
          (?: \ud83d [\ude00-\ude4f] )*
     )                                       # (1 end)
     (?:
          (?! \ud83d [\ude00-\ude4f] )
          [\S\s] 
     )+
     (                                       # (2 start)
          (?: \ud83d [\ude00-\ude4f] )*
     )                                       # (2 end)
    

    edit: To use different emoji utf16 ranges do this

    Different high surrogates:

    (?:
         High_surrogate_A [Low_surrogate_start_A-Low_surrogate_end_A]
      |  High_surrogate_B [Low_surrogate_start_B-Low_surrogate_end_B]
      |  High_surrogate_C [Low_surrogate_start_C-Low_surrogate_end_C]
    )
    

    or, same high surrogate, different low surrogate ranges:

    (?:
         High_surrogate [Low_surrogate_start1-Low_surrogate_end1Low_surrogate_start2-Low_surrogate_end2]
    )
    

    or, mix:

    (?:
         High_surrogate_A [Low_surrogate_startA1-Low_surrogate_endA1Low_surrogate_startA2-Low_surrogate_endA2]
      |  High_surrogate_B [Low_surrogate_start_B-Low_surrogate_end_B]
    )
    

    Where you see:

    (?: \ud83d [\ude00-\ude4f] )*

    substitue one of the above in placeholder HERE

    HERE*

    Where you see:

    (?! \ud83d [\ude00-\ude4f] )

    substitue one of the above in placeholder HERE

    (?! HERE )


    Note- you can add a High-surrogate range as well, however all the high surrogates must share the same low-surrogate(s) range.