I know this question has been asked before (I searched across a ton of answers with no luck). So, I am finally just asking my own question with my own example. I have a base64 string in a database. Here is the string:
base64:YW5vaXRoZXIgcmVwbHk=CgpNc2cgJiBkYXRhIHJhdGVzIG1heSBhcHBseS4gTXNnIGZyZXF1ZW5jeSB2YXJpZXMuIFJlcGx5IEhFTFAgZm9yIGhlbHAuIFJlcGx5IFNUT1AgdG8gY2FuY2VsLg==
I try and decode it using a simple base64_decode() and I get this strange result:
anoither reply��\�� �]H�]\�X^H\K�\����\]Y[��H�\�Y\ˈ�\HS�܈[��\H����[��[
I did an explode on the beginning of the base64: since it was obviously an indicator that it was stored that way. But, for the life of me, I cannot figure out how to get rid of the trailing strange characters. I have gotten close by doing a urlencode(base64_decode($string)) but then that produces this long strange string:
anoither+reply%02%82%93%5C%D9%C8%09%88%19%18%5D%18H%1C%98%5D%19%5C%C8%1BX%5EH%18%5C%1C%1B%1EK%88%13%5C%D9%C8%19%9C%99%5C%5DY%5B%98%DEH%1D%98%5C%9AY%5C%CB%88%14%99%5C%1B%1EH%12%11S%14%08%19%9B%DC%88%1A%19%5B%1C%0B%88%14%99%5C%1B%1EH%14%D5%13%D4%08%1D%1B%C8%18%D8%5B%98%D9%5B%0B
And I have not been able to figure out how to get rid of the above trailing strange characters either. Anyway, any thoughts or suggestions? Thanks!
Ypu have two base64 strings concatenated in the same field...
You have a string YW5vaXRoZXIgcmVwbHk=CgpNc2cgJiBkYXRhIHJhdGVzIG1heSBhcHBseS4gTXNnIGZyZXF1ZW5jeSB2YXJpZXMuIFJlcGx5IEhFTFAgZm9yIGhlbHAuIFJlcGx5IFNUT1AgdG8gY2FuY2VsLg==
which is garbled when decoded. the =
is used to fill out when input characters do not split nicely into base64 chunks. They are ALWAYS suffixed to the encoded string.
YW5vaXRoZXIgcmVwbHk=
decodes to "anoither reply"
CgpNc2cgJiBkYXRhIHJhdGVzIG1heSBhcHBseS4gTXNnIGZyZXF1ZW5jeSB2YXJpZXMuIFJlcGx5IEhFTFAgZm9yIGhlbHAuIFJlcGx5IFNUT1AgdG8gY2FuY2VsLg==
decodes to
"\n\nMsg & data rates may apply. Msg frequency varies. Reply HELP for help. Reply STOP to cancel."
You need to either split the encoded string at the =
or make sure that the input encodes the string proberly as a whole string and not as a concatenation of two seperate strings.
You may have other strings where there is no issue since the string lengths fit with the base64 chunks.
I would suggest reading a little about base64 encoding to understand the issue: https://www.freecodecamp.org/news/what-is-base64-encoding/
First we need to understand the base64 padding. Here are some strings and their base 64 counterparts:
String | base64 |
---|---|
abc | YWJj |
æbc | w6ZiYw== |
abcd | YWJjZA== |
abcde | YWJjZGU= |
abcdef | YWJjZGVm |
!abcdef | IWFiY2RlZg== |
as is apparent from the above table, going from a single-byte character like a
to a multi-byte character like æ
can be enough for the maximal padding (==
), as if there had been added a single character (as seen in the case of abcd
).Therefore, in order to split the string succesfully, you would first have to try splitting by ==
, and if any of the parts contain =
, split that part by that, and base64_decode the rest. when you get to a point where there are no = characters, you can safely run base64_decode
on the remains.
If PHP's base64_decode
starts casting errors based on the missing padding, you will have to re-add them before decoding.
I have not checked the php base64 decode function - my analysis has been done in notepad++ with the mime tools plugin. that supports padded and unpadded base64 encoding (if you had unpadded encoding in php you would be in trouble, but enough of that).