Search code examples
javac#base64base64url

How to generate base64 string from Java to C#?


I am trying to convert a Java function in C#. Here is the original code:

class SecureRandomString {
    private static SecureRandom random = new SecureRandom();
    private static Base64.Encoder encoder = Base64.getUrlEncoder().withoutPadding();

    public static String generate(String seed) {

        byte[] buffer;
        if (seed == null) {
            buffer = new byte[20];
            random.nextBytes(buffer);
        }
        else {
                buffer = seed.getBytes();
        }
        return encoder.encodeToString(buffer);
    }
}

And here is what I did in C#:

public class Program
{
    private static readonly Random random = new Random();
    
    public static string Generate(string seed = null)
    {
        byte[] buffer;
        if (seed == null)
        {
            buffer = new byte[20];
            random.NextBytes(buffer);
        }
        else
        {
            buffer = Encoding.UTF8.GetBytes(seed);
        }

        return System.Web.HttpUtility.UrlPathEncode(RemovePadding(Convert.ToBase64String(buffer)));
    }

    private static string RemovePadding(string s) => s.TrimEnd('=');
}

I wrote some testcases:

Assert(Generate("a"), "YQ");
Assert(Generate("ab"), "YWI");
Assert(Generate("abc"), "YWJj");
Assert(Generate("abcd"), "YWJjZA");
Assert(Generate("abcd?"), "YWJjZD8");
Assert(Generate("test wewqe_%we()21-3012"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI");
Assert(Generate("test wewqe_%we()21-3012_"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTJf");
Assert(Generate("test wewqe_%we()21-3012/"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTIv");
Assert(Generate("test wewqe_%we()21-3012!"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTIh");
Assert(Generate("test wewqe_%we()21-3012a?"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTJhPw");`

And everything works fine, until I try the following one:

Assert(Generate("test wewqe_%we()21-3012?"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI_");

My code output dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI/ instead of the expected dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI_. Why?

I think that the culprit is the encoder. The original code configure its encoder like this Base64.getUrlEncoder().withoutPadding(). The withoutPadding() is basically a TrimEnd("=") but I am not sure how to code the getUrlEncoder().

I looked into this handy conversion table URL Encoding using C# without finding nothing for my case.

I tried HttpUtility.UrlEncode but the output is not right.

What did I missed?


Solution

  • According to Oracle documentation, here is what getUrlEncoder() does:

    Returns a Base64.Encoder that encodes using the URL and Filename safe type base64 encoding scheme.

    Alright what is "URL and Filename safe". Once more the documenation is helping:

    Uses the "URL and Filename safe Base64 Alphabet" as specified in Table 2 of RFC 4648 for encoding and decoding. The encoder does not add any line feed (line separator) character. The decoder rejects data that contains characters outside the base64 alphabet.

    We can now look online for the RFC 4648. Here is the Table 2:

         Table 2: The "URL and Filename safe" Base 64 Alphabet
    
     Value Encoding  Value Encoding  Value Encoding  Value Encoding
         0 A            17 R            34 i            51 z
         1 B            18 S            35 j            52 0
         2 C            19 T            36 k            53 1
         3 D            20 U            37 l            54 2
         4 E            21 V            38 m            55 3
         5 F            22 W            39 n            56 4
         6 G            23 X            40 o            57 5
         7 H            24 Y            41 p            58 6
         8 I            25 Z            42 q            59 7
         9 J            26 a            43 r            60 8
        10 K            27 b            44 s            61 9
        11 L            28 c            45 t            62 - (minus)
        12 M            29 d            46 u            63 _
        13 N            30 e            47 v           (underline)
        14 O            31 f            48 w
        15 P            32 g            49 x
        16 Q            33 h            50 y         (pad) =
    

    It is an encoding table. For example given 0 should output A, given 42 should ouput q, etc.

    Let's check the decoding table, the Table 1:

                      Table 1: The Base 64 Alphabet
    
     Value Encoding  Value Encoding  Value Encoding  Value Encoding
         0 A            17 R            34 i            51 z
         1 B            18 S            35 j            52 0
         2 C            19 T            36 k            53 1
         3 D            20 U            37 l            54 2
         4 E            21 V            38 m            55 3
         5 F            22 W            39 n            56 4
         6 G            23 X            40 o            57 5
         7 H            24 Y            41 p            58 6
         8 I            25 Z            42 q            59 7
         9 J            26 a            43 r            60 8
        10 K            27 b            44 s            61 9
        11 L            28 c            45 t            62 +
        12 M            29 d            46 u            63 /
        13 N            30 e            47 v
        14 O            31 f            48 w         (pad) =
        15 P            32 g            49 x
        16 Q            33 h            50 y
    

    Note that both table are strictly equals minus two things:

    • '+' is encoded to '-'
    • '/' is encoded to '_'

    You should be able to fix your problem with:

    private static string Encode(string s) => s.Replace("+", "-").Replace("/", "_");