Search code examples
unicodecode-injection

\uD83D\uDCCC keep showing up in code I've inherited. What does this unicode sequence do?


I've been reading about code injection using unicode sequences and have been using a tool from Dotnetsafer to locate sequences in a codebad I've inherited. This sequence \uD83D\uDCCC keeps coming up:

An example:

appears as: [588]                             __builder5.AddMarkupContent(51, "??");
actual    : [588]                             __builder5.AddMarkupContent(51, "\uD83D\uDCCC");

What is this sequence? Why would the code be injecting it into HTML?

EDIT 1: I've looked up the sequence and the only thing remotely useful that I've found is https://unicode.scarfboy.com/?s=D83D+DCCC


Solution

  • Those are the UTF-16 code units that encode the Unicode character U+1F4CC (the pushpin emoji 📌).

    How could you have found out?

    1. Look up U+D83D and U+DCCC and find out that they are not actual Unicode characters, but high and low surrogates respectively, meaning they are used in UTF-16
    2. Google for "D83D DCCC" and find this page which explicitly lists those as the UTF-16 encoding of the pushpin emoji.

    Actually, come to think of it, you could just skip step #1 ;-)