I'm in a situation where I need to revert data back to a buffer that has had toString called on it. For example:
const buffer // I need this, or equivalent
const bufferString = buffer.toString() // This is all I have
The node documentation implies that .toString()
defaults to 'utf8' encoding, and I can revert this with Buffer.from(bufferString, 'utf8')
, but this doesn't work and I get different data. (maybe some data loss when it is converted to a string, although the documentation doesn't seem to mention this).
Does anyone know why this is happening or how to fix it?
Here is the data I have to reproduce this:
const intArr = [31, 139, 8, 0, 0, 0, 0, 0, 0, 0, 170, 86, 42, 201, 207, 78, 205, 83, 178, 82, 178, 76, 78, 53, 179, 72, 74, 51, 215, 53, 54, 51, 51, 211, 53, 49, 78, 50, 210, 77, 74, 49, 182, 208, 53, 52, 178, 180, 72, 75, 76, 52, 75, 180, 76, 50, 81, 170, 5, 0, 0, 0, 255, 255, 3, 0, 29, 73, 93, 151, 48, 0, 0, 0]
const buffer = Buffer.from(intArr) // The buffer I want!
const bufferString = buffer.toString() // The string I have!, note .toString() and .toString('utf8') are equivalent
const differentBuffer = Buffer.from(bufferString, 'utf8')
You can get the initial intArr
from a buffer by doing this:
JSON.parse(JSON.stringify(Buffer.from(buffer)))['data']
Edit: interestingly calling .toString()
on differentBuffer
gives the same initial string.
I think the important part of the documentation you linked is When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD � will be used to represent those errors.
When you are converting your buffer into a utf8 string, not all characters are valid utf8, as you can see by doing a console.log(bufferString);
almost all of it comes out as gibberish. Therefore you are irretrievably losing data when converting from the buffer into a utf8 string and you can't get that lost data back when converting back into the buffer.
In your example if you were to use utf16 instead of utf8 you don't lose information and thus your buffer is the same once converting back. I.E
const intArr = [31, 139, 8, 0, 0, 0, 0, 0, 0, 0, 170, 86, 42, 201, 207, 78, 205, 83, 178, 82, 178, 76, 78, 53, 179, 72, 74, 51, 215, 53, 54, 51, 51, 211, 53, 49, 78, 50, 210, 77, 74, 49, 182, 208, 53, 52, 178, 180, 72, 75, 76, 52, 75, 180, 76, 50, 81, 170, 5, 0, 0, 0, 255, 255, 3, 0, 29, 73, 93, 151, 48, 0, 0, 0]
const buffer = Buffer.from(intArr);
const bufferString = buffer.toString('utf16le');
const differentBuffer = Buffer.from(bufferString, 'utf16le') ;
console.log(buffer); // same as the below log
console.log(differentBuffer); // same as the above log