Search code examples
base64freepascallazarus

Validating Base64 input with Free Pascal and DecodeStringBase64


I know there are dozens of questions about this already, in various forms. My question is slightly more direct though.

Using Free Pascal and the s:=DecodeStringBase64(s); function, is there anyway to validate if the decoded string that is passed as s is actually decoded from proper Base64 input data in the first place to avoid decoded garbage?

The best I have done is used a reg exp to identify potential Base64 data (from the accepted answer here). I then check if it is divisible by 4 using mod. If it is divisible by 4, I pass it to DecodeStringBase64. However, I am still getting lots of false positives and returned data that has 'decoded' but was clearly not Base64 in the first place, despite matching the reg exp. For example "WindowsXP=" matches the expression but is not Base64 encoded data.

Equally, the name 'Ted' encodes as VGVk which doesn't even have the usual '=' padding (which can help to flag it as a footer) but it still a potential Base64 fragment that I'd like to find and decode.

In PHP, there is base64_decode() for which a true parameter can be passed to help with validation.

AFAIK, Free Pascal does not have this with DecodeStringBase64 and I need some way of validating.

Other useful replies around the subject of decoding and encoding, if the reader happens to be looking for it as I was yesterday, is here


Solution

  • Short answer is no, there is no 100% working validation for Base64 encoded strings.

    The = sign in Base64 encoded string is not significant, it is for padding and so it doesn't always need to be there (encoded string just have to be multiple 4 in length). You can only check if the string length is multiple of 4, check for valid characters from the Base64 alphabet (see Page 5, Table 1) and verify, if there is not more than two = padding sign chars at the end of the input string. Here's a code, that can verify, if the passed string can be a valid Base64 encoded string (nothing more you can do, anyway):

    function CanBeValidBase64EncodedString(const AValue: string): Boolean;
    const
      Base64Alphabet = ['A'..'Z', 'a'..'z', '0'..'9', '+', '/'];
    var
      I: Integer;
      ValLen: Integer;
    begin
      ValLen := Length(AValue);
      Result := (ValLen > 0) and (ValLen mod 4 = 0);
      if Result then
      begin
        while (AValue[ValLen] = '=') and (ValLen > Length(AValue) - 2) do
          Dec(ValLen);
        for I := ValLen downto 1 do
          if not (AValue[I] in Base64Alphabet) then
          begin
            Result := False;
            Break;
          end;
      end;
    end;