Search code examples
.neturiurlencode

Problem with Uri class in .NET Framework and special characters


I have an image URL that contains a left-to-right mark character. That is an unprintable character that is used to set the way adjacent characters are grouped with respect to text direction. This is the original URL: https://simply-listening.nl/wp-content/uploads/2021/03/Toyah-‎–-Anthem.jpg. When URL encoding the URL you get: https://simply-listening.nl/wp-content/uploads/2021/03/Toyah-%e2%80%8e%e2%80%93-Anthem.jpg

It my look a little confusing: After the Toyah part, there's:

  • a hyphen
  • the left-to-right mark (%e2%80%8e)
  • an en-dash (a longer hyphen: %e2%80%93)
  • a hyphen

The two hyphens will not get URL encoded but the left-to-right mark and the en-dash are.

This all is fine and you can open both the non-encoded and the encoded URLs in Chrome without problem. The issue is when I create a Uri class instance with this URL.

new Uri("https://simply-listening.nl/wp-content/uploads/2021/03/Toyah-%e2%80%8e%e2%80%93-Anthem.jpg")

The created instance has a AbsoluteUri property with the following value: https://simply-listening.nl/wp-content/uploads/2021/03/Toyah-%E2%80%93-Anthem.jpg

As you can see the left-to-right mark character is removed from the URL and the URL no longer works of course.

Is there a reason the Uri class would remove certain characters (even encoded) from a URL? I would assume that any character that's not valid in a URL could still be used as long as it's URL encoded. And browsers seem fine with it as well.


Solution

  • Thanks to @Simon Mourier's comment I found out that the issue is not showing up in .NET Framework 4.7.2.

    Since we are using .NET Framework 4.7.1 I checked the changelog for 4.7.2 and there it was: "Fixed a problem in System.Uri where Unicode bidirectional control characters would be stripped from a Uri during parsing.".

    Looks like it was indeed a bug, and it's fixed now.