Search code examples
c#utf-8decodeurldecode

Decoding u002522, u002522 and a lot of backslash


I am using WebClient to fetch som web requests:

public static string PostHttp(string url, Dictionary<string, string> headers, Dictionary<string, string> postParams)
{
    using (WebClient client = new WebClient())
    {                
        if (headers != null)
        {
            foreach (var header in headers)
            {
                client.Headers.Add(header.Key, header.Value);
            }
        }

        var reqparm = new System.Collections.Specialized.NameValueCollection();

        if (postParams != null)
        {
            foreach (var param in postParams)
            {
                reqparm.Add(param.Key, param.Value);
            }
        }


        byte[] responsebytes = client.UploadValues(url, "POST", reqparm);
        return  Encoding.UTF8.GetString(responsebytes);
    }
}

And I get something like following in Visual Studio:

\\\"see_more_cards_id\\\",\\\"href\\\":\\\"\\\\\\/page_content_list_view\\\\\\/more\\\\\\/?page_id=200168320060101&start_cursor=\\\\u00257B\\\\u002522timeline_cursor\\\\u002522\\\

In post man it looks better here I would just be able to do an url decode:

\"see_more_cards_id\",\"href\":\"\\\/page_content_list_view\\\/more\\\/?page_id=200168320060101&start_cursor=\%7B\%22timeline_cursor\%22\%3

In Chrome debugger its like following:

\"see_more_cards_id\",\"href\":\"\\\/page_content_list_view\\\/more\\\/?page_id=200168320060101&start_cursor=\\u00257B\\u002522timeline_cursor\\u002522\\u00253A\\u002522timeline_unit\\

What I am looking for is a decoded versions like:

"see_more_cards_id","href":\"/page_content_list_view/more/?page_id=200168320060101&start_cursor={"timeline_cursor":""timeline_unit:timeline_unit: 1:00000000001564446283:04611686018427387904:091:00000000001564446283:04611686018`427387904:09

I tried searching for decoding characters like u002522 but there is only a very limited information. I found following post that suggested using Uri.UnescapeDataString but this didnt decode the chars.

Decode chars


Solution

  • \\\\u002522
    

    There are many layers here.

    First, there is \ escaping for \. (This one is probably just a debugger trying to be helpful.) So,

    \\u002522
    

    Then there is \ escaping for \. So,

    \u002522
    

    Then there is \u escaping for a UTF-16 code unit

    %22
    

    Then there is %-encoding (aka URL-encoding) of bytes, presumably for UTF-8 code units

    "