Search code examples
filebase64sizefilesize

What file size is data if it's 450KB base64 encoded?


Is it possible to compute the size of data if I know its size when it's base64 encoded?

I've a file that is 450KB in size when base64 encoded but what size is it decompressed?

Is there a method to find output size without decompressing the file first?


Solution

  • I've a file that is 450KB in size when base64 encoded but what size is it decompressed?

    In fact, you don't "decompress", you decode. The result will be smaller than the encoded data.

    As Base 64 encoding needs ~ 8 bits for each 6 bits of the original data (or 4 bytes to store 3), the math is simple:

    Encoded          Decoded
    450KB  / 4 * 3 = ~ 337KB
    

    The overhead between Base64 and decoded string is nearly constant, 33.33%. I say "nearly" just because the padding bytes at the end (=) that make the string length multiple of 4. See some examples:

    String              Encoded                   Len   B64   Pad  Space needed
    A                   QQ==                      1     2     2    400.00%
    AB                  QUI=                      2     3     1    200.00%
    ABC                 QUJD                      3     4     0    133.33%
    ABCD                QUJDRA==                  4     6     2    200.00%
    ABCDEFGHIJKLMNOPQ   QUJDREVGR0hJSktMTU5PUFE=  17    23    1    140.00%
    ( 300 bytes )       ( 400 bytes )             300   400   0    133.33%
    ( 500 bytes )       ( 668 bytes )             500   666   2    133.60%
    ( 5000 bytes )      ( 6668 bytes )            5000  6666  2    133.36%
                                                      ... tends to 133.33% ...
    


    Calculating the space for unencoded data:

    Let's get the value QUJDREVGR0hJSktMTU5PUFE= mentioned above.

    1. There are 24 bytes in the encoded value.

    2. Let's calculate 24 / 4 * 3 => the result is 18.

    3. Let's count the number of =s on the end of encoded value: In this case, 1
      (we need to check only the 2 last bytes of encoded data).

    4. Getting 18 (obtained on step 2) - 1 (obtained on step 3 ) we get 17

    So, we need 17 bytes to store the data.

    Or, as @kriegaex commmented:

    originalBytes = base64Bytes * 3 / 4 - numEqualCharsAtEnd