Search code examples
javascriptemojifirefox-addon-webextensionsunicode-stringweb-extension

Sanitise unicode pair for filename in javascript?


My web-extension fails to initiate file download for filenames having a pair of emojis with invalid filename error, this seems to be some unicode surrogate pair issue when multiple emojis are used. Here is the offending filename example:

<a href="https://www.example.com/filestream.xyz"
   download="The New World Order Presentation 👨‍🌾🇳🇱.pdf"
   target="_blank">Download File</a>

As evident from the 'Chrome devtools DOM elements' screenshot below the farmer emoji (https://emojipedia.org/man-farmer/) seem's to be a combination of multiple code-points and is the reason causing the filename to be invalid. When the code is pasted here as above the emoji's are correctly parsed as farmer and flag but when we see it in Dev-tools DOM they are different. Inspecting the filename shared above in devtools displays the issue.

The Farmer Emoji

The code which pushes the download:

function notifyExtension(e) {
  var elem = e.currentTarget;
  var fileSaveName = elem.getAttribute("download");
  e.returnValue = false;

  if (e.preventDefault) {
    e.preventDefault();
  }
  var loop = elem.getAttribute("loop");
  if (loop) {
    chrome.runtime.sendMessage({
      url: elem.getAttribute("href"),
      filename: fileSaveName,
    });
  }
  return false;
}

The background code which starts the download using browser api:

chrome.runtime.onMessage.addListener(function (message) {
  
  let fname = message.filename
    .trim()
    .replace(/[`~!@#$%^&*()_|+\-=?;:'",<>{}[\]\\/]/gi, "-")
    .replace(/[\\/:*?"<>|]/g, "_")
    .substring(0, 240)
    .replace(/\s+/g, " ");
  chrome.downloads.download({
    url: message.url,
    filename: fname,
    conflictAction: "uniquify",
    saveAs: true,
  });
});

The error we get in browser console:

Unchecked lastError value: Error: filename must not contain illegal characters

Error in browser console

How to sanitise the string to have only valid filenames for such situations in javascript? It seems emojis are not an issue, but multiple emojis are !!!


Solution

  • you can use Unicode properties class to find emojis in a string

    syntax is \p{...}

    example

    console.log("👨‍🌾aa".replace(/\p{So}/gu, ""))

    there are more options to use class \p{...}, you can see them in docs

    If single emojis do not cause a failure, but man farmer does cause of the problem is zero width joiner. It is an invalind symbol in filenames in chrome. Run a search for U+200D

    Resulting regex

    /\p{So}\u{200D}\p{So}/gu