Search code examples
javascriptjsonencodingutf-8character-encoding

Encoding/Decoding Emojis in JS/TS


I have build out an extremely robust system for Encoding and Decoding. Despite all of this I cannot seems to fix this bug.

Its the weirdest thing some of the Tags with Emojis work and others do not. The weirdest part: If I add a tag that doesnt work AFTER one that does work it works perfectly fine an is able to decode them all correctly.

Here is my current system:

// Encode SearchChips to Base64 QID
export const encodeChipsToQid = (chips: SearchChip[]): string => {
  try {
    const jsonString = JSON.stringify(chips); // Convert to JSON
    console.log("JSON String before encoding:", jsonString);

    const uint8Array = new TextEncoder().encode(jsonString); // Encode as UTF-8
    console.log("Uint8Array before Base64 encoding:", uint8Array);

    const base64String = btoa(String.fromCharCode(...uint8Array)); // Base64-encode binary
    console.log("Encoded Base64 String:", base64String);

    return base64String;
  } catch (error) {
    console.error("Failed to encode chips:", error);
    return ""; // Gracefully handle errors
  }
};

// Decode QID from Base64 to SearchChips
export const decodeQidToChips = (qid: string): SearchChip[] => {
  if (!qid) return [];
  try {
    console.log("Received QID for decoding:", qid);

    // Decode Base64 to binary string
    const binaryString = atob(qid);
    console.log("Binary string after Base64 decoding:", binaryString);

    // Convert binary string to Uint8Array
    const uint8Array = Uint8Array.from(binaryString, (char) => char.charCodeAt(0));
    console.log("Uint8Array after Base64 decoding:", uint8Array);

    // Decode UTF-8 string from Uint8Array
    const jsonString = new TextDecoder().decode(uint8Array);
    console.log("Decoded JSON String:", jsonString);

    // Parse JSON
    const parsedChips = JSON.parse(jsonString);
    console.log("Parsed Search Chips:", parsedChips);

    return parsedChips;
  } catch (error) {
    console.error("Failed to decode qid:", error);
    return [];
  }
};

Previously I was using this which yielded similar results to the above:

import { SearchChip } from "@shared/schema/types";

// Encode SearchChips to Base64 QID
export function encodeChipsToQid(chips: SearchChip[]): string {
  const data = chips.map((chip) => ({
    type: chip.type,
    value: chip.value,
  }));

  const jsonString = JSON.stringify(data);
  return btoa(
    encodeURIComponent(jsonString).replace(
      /%([0-9A-F]{2})/g,
      (_, p1) => String.fromCharCode(parseInt(p1, 16))
    )
  );
}

// Decode Base64 QID back into SearchChips
export function decodeQidToChips(qid: string): SearchChip[] {
  try {
    const jsonString = decodeURIComponent(
      atob(qid)
        .split("")
        .map((c) => `%${c.charCodeAt(0).toString(16).padStart(2, "0")}`)
        .join("")
    );

    const data = JSON.parse(jsonString);
    return data.map((item: any) => ({
      type: item.type,
      value: item.value,
    }));
  } catch (error) {
    console.error("Failed to decode qid:", error);
    return [];
  }
}

Here is the output from a failing case:

JSON String before encoding: [{"type":"item","value":{"__typename":"TagBase","name":"Insurance","color":"Teal","icon":"🏥","tagId":"63"}}]
VM145990 SearchEncoding.ts:12 Uint8Array before Base64 encoding: Uint8Array(111) [91, 123, 34, 116, 121, 112, 101, 34, 58, 34, 105, 116, 101, 109, 34, 44, 34, 118, 97, 108, 117, 101, 34, 58, 123, 34, 95, 95, 116, 121, 112, 101, 110, 97, 109, 101, 34, 58, 34, 84, 97, 103, 66, 97, 115, 101, 34, 44, 34, 110, 97, 109, 101, 34, 58, 34, 73, 110, 115, 117, 114, 97, 110, 99, 101, 34, 44, 34, 99, 111, 108, 111, 114, 34, 58, 34, 84, 101, 97, 108, 34, 44, 34, 105, 99, 111, 110, 34, 58, 34, 240, 159, 143, 165, 34, 44, 34, 116, 97, 103, …]
VM145990 SearchEncoding.ts:14 Encoded Base64 String: W3sidHlwZSI6Iml0ZW0iLCJ2YWx1ZSI6eyJfX3R5cGVuYW1lIjoiVGFnQmFzZSIsIm5hbWUiOiJJbnN1cmFuY2UiLCJjb2xvciI6IlRlYWwiLCJpY29uIjoi8J+PpSIsInRhZ0lkIjoiNjMifX1d
_app.tsx:7 _app.js rendered
VM145990 SearchEncoding.ts:25 Received QID for decoding: W3sidHlwZSI6Iml0ZW0iLCJ2YWx1ZSI6eyJfX3R5cGVuYW1lIjoiVGFnQmFzZSIsIm5hbWUiOiJJbnN1cmFuY2UiLCJjb2xvciI6IlRlYWwiLCJpY29uIjoi8J PpSIsInRhZ0lkIjoiNjMifX1d
VM145990 SearchEncoding.ts:28 Binary string after Base64 decoding: [{"type":"item","value":{"__typename":"TagBase","name":"Insurance","color":"Teal","icon":"ðéHYÒYÈ_W
VM145990 SearchEncoding.ts:31 Uint8Array after Base64 decoding: Uint8Array(110) [91, 123, 34, 116, 121, 112, 101, 34, 58, 34, 105, 116, 101, 109, 34, 44, 34, 118, 97, 108, 117, 101, 34, 58, 123, 34, 95, 95, 116, 121, 112, 101, 110, 97, 109, 101, 34, 58, 34, 84, 97, 103, 66, 97, 115, 101, 34, 44, 34, 110, 97, 109, 101, 34, 58, 34, 73, 110, 115, 117, 114, 97, 110, 99, 101, 34, 44, 34, 99, 111, 108, 111, 114, 34, 58, 34, 84, 101, 97, 108, 34, 44, 34, 105, 99, 111, 110, 34, 58, 34, 240, 147, 233, 72, 139, 8, 157, 24, 89, 210, …]
VM145990 SearchEncoding.ts:34 Decoded JSON String: [{"type":"item","value":{"__typename":"TagBase","name":"Insurance","color":"Teal","icon":"��H��Y�Y����ȟ_W
VM145990 SearchEncoding.ts:40 Failed to decode qid: SyntaxError: Bad control character in string literal in JSON at position 94 (line 1 column 95)
    at JSON.parse (<anonymous>)
    at decodeQidToChips (VM145990 SearchEncoding.ts:36:34)
    at SearchController.decodeQueryFromUrl (SearchController.ts:109:28)

But weirdly enough here is the exact same emoji being correctly decoded (just added after a working one)

JSON String before encoding: [{"type":"item","value":{"__typename":"TagBase","name":"Income","color":"Green","icon":"$","tagId":"91"}},{"type":"operator","value":{"op":"or"}},{"type":"item","value":{"__typename":"TagBase","name":"Insurance","color":"Teal","icon":"🏥","tagId":"63"}}]
VM145990 SearchEncoding.ts:12 Uint8Array before Base64 encoding: Uint8Array(256) [91, 123, 34, 116, 121, 112, 101, 34, 58, 34, 105, 116, 101, 109, 34, 44, 34, 118, 97, 108, 117, 101, 34, 58, 123, 34, 95, 95, 116, 121, 112, 101, 110, 97, 109, 101, 34, 58, 34, 84, 97, 103, 66, 97, 115, 101, 34, 44, 34, 110, 97, 109, 101, 34, 58, 34, 73, 110, 99, 111, 109, 101, 34, 44, 34, 99, 111, 108, 111, 114, 34, 58, 34, 71, 114, 101, 101, 110, 34, 44, 34, 105, 99, 111, 110, 34, 58, 34, 36, 34, 44, 34, 116, 97, 103, 73, 100, 34, 58, 34, …]
VM145990 SearchEncoding.ts:14 Encoded Base64 String: W3sidHlwZSI6Iml0ZW0iLCJ2YWx1ZSI6eyJfX3R5cGVuYW1lIjoiVGFnQmFzZSIsIm5hbWUiOiJJbmNvbWUiLCJjb2xvciI6IkdyZWVuIiwiaWNvbiI6IiQiLCJ0YWdJZCI6IjkxIn19LHsidHlwZSI6Im9wZXJhdG9yIiwidmFsdWUiOnsib3AiOiJvciJ9fSx7InR5cGUiOiJpdGVtIiwidmFsdWUiOnsiX190eXBlbmFtZSI6IlRhZ0Jhc2UiLCJuYW1lIjoiSW5zdXJhbmNlIiwiY29sb3IiOiJUZWFsIiwiaWNvbiI6IvCfj6UiLCJ0YWdJZCI6IjYzIn19XQ==
_app.tsx:7 _app.js rendered
VM145990 SearchEncoding.ts:25 Received QID for decoding: W3sidHlwZSI6Iml0ZW0iLCJ2YWx1ZSI6eyJfX3R5cGVuYW1lIjoiVGFnQmFzZSIsIm5hbWUiOiJJbmNvbWUiLCJjb2xvciI6IkdyZWVuIiwiaWNvbiI6IiQiLCJ0YWdJZCI6IjkxIn19LHsidHlwZSI6Im9wZXJhdG9yIiwidmFsdWUiOnsib3AiOiJvciJ9fSx7InR5cGUiOiJpdGVtIiwidmFsdWUiOnsiX190eXBlbmFtZSI6IlRhZ0Jhc2UiLCJuYW1lIjoiSW5zdXJhbmNlIiwiY29sb3IiOiJUZWFsIiwiaWNvbiI6IvCfj6UiLCJ0YWdJZCI6IjYzIn19XQ==
VM145990 SearchEncoding.ts:28 Binary string after Base64 decoding: [{"type":"item","value":{"__typename":"TagBase","name":"Income","color":"Green","icon":"$","tagId":"91"}},{"type":"operator","value":{"op":"or"}},{"type":"item","value":{"__typename":"TagBase","name":"Insurance","color":"Teal","icon":"ð¥","tagId":"63"}}]
VM145990 SearchEncoding.ts:31 Uint8Array after Base64 decoding: Uint8Array(256) [91, 123, 34, 116, 121, 112, 101, 34, 58, 34, 105, 116, 101, 109, 34, 44, 34, 118, 97, 108, 117, 101, 34, 58, 123, 34, 95, 95, 116, 121, 112, 101, 110, 97, 109, 101, 34, 58, 34, 84, 97, 103, 66, 97, 115, 101, 34, 44, 34, 110, 97, 109, 101, 34, 58, 34, 73, 110, 99, 111, 109, 101, 34, 44, 34, 99, 111, 108, 111, 114, 34, 58, 34, 71, 114, 101, 101, 110, 34, 44, 34, 105, 99, 111, 110, 34, 58, 34, 36, 34, 44, 34, 116, 97, 103, 73, 100, 34, 58, 34, …]
VM145990 SearchEncoding.ts:34 Decoded JSON String: [{"type":"item","value":{"__typename":"TagBase","name":"Income","color":"Green","icon":"$","tagId":"91"}},{"type":"operator","value":{"op":"or"}},{"type":"item","value":{"__typename":"TagBase","name":"Insurance","color":"Teal","icon":"🏥","tagId":"63"}}]
VM145990 SearchEncoding.ts:37 Parsed Search Chips: (3) [{…}, {…}, {…}]

Solution

  • The logs show that you have a difference in the output of encodeChipsToQid and the input to decodeQidToChips when obviously that should be the same string: a + in the first string has become a space in the second string. This is a typical effect you get when passing a string via an URL without properly encoding it (for a URL).

    Also, in your logs there is a line "_app.js rendered" which confirms the idea that this string is passed on via an HTTP call.

    Base64 encoded strings may have the characters +, / and = which have special meanings in URLs, so make sure to avoid wrong transformations of those. One way is to call encodeURIComponent as the last step before passing the string via a URL, and decodeURIComponent at the receiving end.

    This is something you did in your previous version of the code, although there you do it at the wrong stage: encodeURIComponent should be called after all other manipulations in encodeChipsToQid and decodeURIComponent should be called before any other manipulation in decodeQidToChips.

    Alternatively, you could use Base64URL encoding instead of Base64. See Base64URL decoding via JavaScript?