I have an image URL that contains a left-to-right mark character.
That is an unprintable character that is used to set the way adjacent characters are grouped with respect to text direction.
This is the original URL: https://simply-listening.nl/wp-content/uploads/2021/03/Toyah-–-Anthem.jpg
.
When URL encoding the URL you get: https://simply-listening.nl/wp-content/uploads/2021/03/Toyah-%e2%80%8e%e2%80%93-Anthem.jpg
It my look a little confusing:
After the Toyah
part, there's:
The two hyphens will not get URL encoded but the left-to-right mark and the en-dash are.
This all is fine and you can open both the non-encoded and the encoded URLs in Chrome without problem. The issue is when I create a Uri class instance with this URL.
new Uri("https://simply-listening.nl/wp-content/uploads/2021/03/Toyah-%e2%80%8e%e2%80%93-Anthem.jpg")
The created instance has a AbsoluteUri
property with the following value:
https://simply-listening.nl/wp-content/uploads/2021/03/Toyah-%E2%80%93-Anthem.jpg
As you can see the left-to-right mark character is removed from the URL and the URL no longer works of course.
Is there a reason the Uri class would remove certain characters (even encoded) from a URL? I would assume that any character that's not valid in a URL could still be used as long as it's URL encoded. And browsers seem fine with it as well.
Thanks to @Simon Mourier's comment I found out that the issue is not showing up in .NET Framework 4.7.2.
Since we are using .NET Framework 4.7.1 I checked the changelog for 4.7.2 and there it was: "Fixed a problem in System.Uri where Unicode bidirectional control characters would be stripped from a Uri during parsing.".
Looks like it was indeed a bug, and it's fixed now.