Search code examples
decodefirefox-addon-webextensionstyped-arrayschrome-webrequest

Decode String from TypedArray cached data passed via webRequest FilterResponseData StreamFilter


I am using Mozilla's webRequest StreamFilter to read http requests

https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/filterResponseData

https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/StreamFilter/ondata

As long as the requests are not cached in the browser, the decoded text from the typed array is decoded correctly, but the problem arises when the data comes from cache. When that happens, this becomes the output for the same data that was being previously decoded with success:

����������20180901034956%������������������������������������������������ ����~�������v�a�r

That's only a small sample, the string is quite huge. I am using TextDecoder for this purpose with the UTF-8 encoding option set.

After some digging I discovered the cached data type array is filled with zeros, so when reading the unicode of that string it translates to the following, which I believe to be null in string:

\u0000\u0000\u0000\u0000\u0000\u0000\u000e\u0000\u0000\u000020180901034956%\u0000\u0000\u0000\u0005\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0014\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0002\u0000\u0000\u0000\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0002\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0005 \u0003\u0000\u0001\u0000~�\u0013\u0000\u0000\u0000\u0000\u0000v\u0000a\u0000r

A quick "solution" that was applied was to just remove the zeros from the typedarray or the nulls from the string, without any awareness of how it might affect the original data, resulting in the following:

�20180901034956%������� ��~��var

And the respective unicode version

\u000e20180901034956%\u0005\u0001\u0014\u0002\u0001\u0002\u0005 \u0003\u0001~�\u0013var

Searching some of those characters online showed that some of those are control characters and now I am stuck not knowing why this is happening at all - besides that this is cached data - and how to decoded it correctly.

I've tried reaching for Firefox's channels for some help, but it appears no one is answering after 3 straight days so I am posting this question here.

How can I decode the cache data the right way and also can anyone explain why it is showing like this? All the other data that is not cached is being decoded correctly.

EDIT: After more digging I confirmed the data is the one being read from the Firefox cached file. I was able to locate the right file and inside it contains the following information sample (this was opened in notepad):

20180901034956% ~ÿ v a r

The text following "v a r" follows the same format of each character followed by a "space", if that is even a space. So it is confirmed this is data being passed from the cached file, so how can I decode it correctly? I believe removing the spaces is not a proper solution seeing that it could also remove spaces that are part of the original data information.

EDIT2: The data is suppose to look like this:

var

Nothing behind that and no spaces in between each characters.


Solution

  • Looks like this is indeed a Firefox bug and the workaround I was already using is the only way to "fix" this by ourselves until Firefox decides to correct the problem itself.

    Forgot to add the link to the bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1530408#c6