Search code examples
javascriptjquery.netasp.net-corecharacter-encoding

Why a string that has persian/arabic and English characters, was messed up in browser view?


I'm coding an ASP.NET core application. I read a record of SQL database and I want to show it in my html page through a Jquery function. It's a part of my code:

//some code...

    <script charset="utf-8" type="text/javascript">
       $(document).ready(function () {
         var materialname = [];
         
         @if (Model.Count() != 0)
         {
            foreach (var item in Model)
            {
              @:console.log('name: ' + '@item.material.StoreName');
              @:materialname.push('@item.material.StoreName');

            } 
         }
       });
   </script>

When I run this code, it's showed an incorrect string in console. for example, StoreName in DB is:

enter image description here

but it is showed in console like this:

 name: &#x645;&#x62D;&#x635;&#x648;&#x644; &#x62F;&#x633;&#x62A;&#x647; AB &#x627;&#x646;&#x62F;&#x627;&#x632;&#x647; 5mm

How can I correct this and show a right encoding char?


Solution

  • This happens because ASP.Net Core automatically HTML-encodes all strings output in Razor pages to prevent cross-site scripting (XSS). This isn’t noticable in HTML because browsers interpret the escape sequences, but in JS it’s just a string going '&#x645;' etc.

    You have several options.

    Globally allow Arabic

    Since your site will be dealing with Arabic text routinely, you may want to exclude the Arabic unicode range from automatic encoding globally. To do this, add this line to your program.cs. It changes the default HtmlEncoder to a new one which takes an array of allowed unicode ranges it won’t touch:

    builder.Services.AddSingleton(HtmlEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.Arabic));
    

    Make sure to leave BasicLatin in there or it will escape spaces and everything.

    There are some more Arabic ranges you may want to add. This option has the benefit of saving massive bandwidth if you’re dealing with a lot of Arabic text.

    Personally I’m in Germany, so I use Latin1Supplement (for umlauts, guillemets etc.) and LatinExtendedAdditional (capital eszett is in there).

    Use Html.Raw()

    This is dangerous and you should only use it if you know what you’re doing, i.e. if you can control what the string contains. If the string contains a ' for example, it will kill your Javascript. If the string contains user-generated input, a user can now inject arbitrary code into your page. Only do this if you are certain the string is harmless.

    foreach (var item in Model)
    {
        @:console.log('name: ' + '@Html.Raw(item.material.StoreName)');
        @:materialname.push('@Html.Raw(item.material.StoreName)');
    } 
    

    Use a custom method to escape strings Javascript style

    Here is an extension method that allows you to call @Html.JsString() instead of @Html.Raw(). It uses the built-in JavaScriptEncoder to escape strings specifically for javascript:

    public static IHtmlContent JsString([NotNull] this IHtmlHelper html, object value)
    => value switch
    {
        null => HtmlString.Empty,
        string s => new HtmlString(System.Text.Encodings.Web.JavaScriptEncoder.Default.Encode(s)),
        _ => new HtmlString(System.Text.Encodings.Web.JavaScriptEncoder.Default.Encode(value.ToString()!)),
    };
    

    This will make your code look like this:

    console.log('\u0645\u062D\u0635\u0648\u0644 \u062F\u0633\u062A\u0647 AB \u0627\u0646\u062F\u0627\u0632\u0647 5mm');
    

    Still escaped, but now Javascript knows how to handle it.