Search code examples
phpbase64

base64 decode brings back weird characters


I know this question has been asked before (I searched across a ton of answers with no luck). So, I am finally just asking my own question with my own example. I have a base64 string in a database. Here is the string:

base64:YW5vaXRoZXIgcmVwbHk=CgpNc2cgJiBkYXRhIHJhdGVzIG1heSBhcHBseS4gTXNnIGZyZXF1ZW5jeSB2YXJpZXMuIFJlcGx5IEhFTFAgZm9yIGhlbHAuIFJlcGx5IFNUT1AgdG8gY2FuY2VsLg==

I try and decode it using a simple base64_decode() and I get this strange result:

anoither reply��\�� �]H�]\�X^H\K�\����\]Y[��H�\�Y\ˈ�\HS�܈[ ��\H����[��[

I did an explode on the beginning of the base64: since it was obviously an indicator that it was stored that way. But, for the life of me, I cannot figure out how to get rid of the trailing strange characters. I have gotten close by doing a urlencode(base64_decode($string)) but then that produces this long strange string:

anoither+reply%02%82%93%5C%D9%C8%09%88%19%18%5D%18H%1C%98%5D%19%5C%C8%1BX%5EH%18%5C%1C%1B%1EK%88%13%5C%D9%C8%19%9C%99%5C%5DY%5B%98%DEH%1D%98%5C%9AY%5C%CB%88%14%99%5C%1B%1EH%12%11S%14%08%19%9B%DC%88%1A%19%5B%1C%0B%88%14%99%5C%1B%1EH%14%D5%13%D4%08%1D%1B%C8%18%D8%5B%98%D9%5B%0B

And I have not been able to figure out how to get rid of the above trailing strange characters either. Anyway, any thoughts or suggestions? Thanks!


Solution

  • Ypu have two base64 strings concatenated in the same field...

    You have a string YW5vaXRoZXIgcmVwbHk=CgpNc2cgJiBkYXRhIHJhdGVzIG1heSBhcHBseS4gTXNnIGZyZXF1ZW5jeSB2YXJpZXMuIFJlcGx5IEhFTFAgZm9yIGhlbHAuIFJlcGx5IFNUT1AgdG8gY2FuY2VsLg== which is garbled when decoded. the = is used to fill out when input characters do not split nicely into base64 chunks. They are ALWAYS suffixed to the encoded string.

    YW5vaXRoZXIgcmVwbHk= decodes to "anoither reply" CgpNc2cgJiBkYXRhIHJhdGVzIG1heSBhcHBseS4gTXNnIGZyZXF1ZW5jeSB2YXJpZXMuIFJlcGx5IEhFTFAgZm9yIGhlbHAuIFJlcGx5IFNUT1AgdG8gY2FuY2VsLg== decodes to "\n\nMsg & data rates may apply. Msg frequency varies. Reply HELP for help. Reply STOP to cancel."

    You need to either split the encoded string at the = or make sure that the input encodes the string proberly as a whole string and not as a concatenation of two seperate strings.

    You may have other strings where there is no issue since the string lengths fit with the base64 chunks.

    I would suggest reading a little about base64 encoding to understand the issue: https://www.freecodecamp.org/news/what-is-base64-encoding/

    Solving the issue

    First we need to understand the base64 padding. Here are some strings and their base 64 counterparts:

    String base64
    abc YWJj
    æbc w6ZiYw==
    abcd YWJjZA==
    abcde YWJjZGU=
    abcdef YWJjZGVm
    !abcdef IWFiY2RlZg==

    as is apparent from the above table, going from a single-byte character like a to a multi-byte character like æ can be enough for the maximal padding (==), as if there had been added a single character (as seen in the case of abcd).Therefore, in order to split the string succesfully, you would first have to try splitting by ==, and if any of the parts contain =, split that part by that, and base64_decode the rest. when you get to a point where there are no = characters, you can safely run base64_decode on the remains.

    If PHP's base64_decode starts casting errors based on the missing padding, you will have to re-add them before decoding.

    Discalimer

    I have not checked the php base64 decode function - my analysis has been done in notepad++ with the mime tools plugin. that supports padded and unpadded base64 encoding (if you had unpadded encoding in php you would be in trouble, but enough of that).