Search code examples
c#.netpostdotnet-httpclient

Character encoding problem with HttpClient.PostAsync


We have a legacy web app which runs ok manually from within browsers. When I try to use the same web app from within code using http posts, I get some Turkish characters as ?.

I have the following code to make a http post:

var httpClient = new HttpClient(); //static readonly in real code

var content = new StringContent("id_6=some text with Turkish characters öçşığüÖÇŞİĞÜ", Encoding.GetEncoding("ISO-8859-9"), "application/x-www-form-urlencoded");
var response = httpClient.PostAsync(url, content).Result; //I know this is not a good way, I'll focus on it later
var responseInString = response.Content.ReadAsStringAsync().Result;
File.WriteAllText("c:\\temp\\a.htm", responseInString);

The web app returns me a html with some input values, including those posted by my code. Those form values posted by my code and those calculated using my values have bad Turkish characters, whereas the hardcoded submit button with Turkish characters look alright.

The web app returns this html (truncated for simplicity) to my code:

<!-- BELOW IS THE HARDCODED FORM FIELD WITH TURKISH CHARS OK! DISPLAYED AS: Programı Çağır -->
<input type="submit" value="Program&#305; &Ccedil;a&#287;&#305;r" name="j_id_jsp_262293626_16"/>

<!-- IRRELEVANT HTML REMOVED -->

<!-- BELOW IS THE OUTPUT FORM FIELD WITH CHAR ş BAD! DISPLAYED AS: some text with Turkish characters öç???üÖÇ???Ü -->
<input type="text" value="some text with Turkish characters &ouml;&ccedil;???&uuml;&Ouml;&Ccedil;???&Uuml;" id="id_2" name="id_2"/>

<!-- BELOW IS THE INPUT FORM FIELD WITH CHAR ş BAD! -->
<input type="text" value="some text with Turkish characters &ouml;&ccedil;???&uuml;&Ouml;&Ccedil;???&Uuml;" id="id_6" name="id_6" />

Response headers look alright: Content headers from debug

What can be wrong?

EDIT: A similar code posting to a sample form works ok:

    static readonly HttpClient httpClient = new HttpClient();

    [TestMethod]
    public void TestHttpClientForTurkish()
    {
        var data = new Dictionary<string, string>()
        {
            {"fname", "öçşığü" },
            {"lname", "ÖÇŞİĞÜ" }
        };

        var content = new FormUrlEncodedContent(data);
        var response = httpClient.PostAsync("https://www.w3schools.com/action_page.php", content).Result;

        var responseInString = response.Content.ReadAsStringAsync().Result;
        Assert.IsTrue(responseInString.Contains("öçşığü") && responseInString.Contains("ÖÇŞİĞÜ"));
    }

Solution

  • My findings:

    1. FormUrlEncodedContent class does not support an Encoding parameter (hence does not handle Turkish characters ok), so I had to use StringContent
    2. I had to use HttpUtility.UrlEncode to encode form values (and use ISO-8859-9 as encoding).

    Here's the final code without any problems in Turkish characters in form fields:

    var httpClient = new HttpClient(); //static readonly in real code
    var iso = Encoding.GetEncoding("ISO-8859-9");
    
    var content = new StringContent("id_6="+
        HttpUtility.UrlEncode("some text with Turkish characters öçşığüÖÇŞİĞÜ", iso), iso, 
        "application/x-www-form-urlencoded");
    var response = httpClient.PostAsync(url, content).Result;//Using Result because I don't have a UI thread or the context is not ASP.NET
    var responseInString = response.Content.ReadAsStringAsync().Result;
    File.WriteAllText("c:\\temp\\a.htm", responseInString);