Search code examples
urlencodingcharacter-encodingreferrer

Strange Encoding problem - hebrew


I have a script which tracks visits & referers to a website.

I send the document.referrer (I use escape() in javascript) to the server and store the string in the database, after decoding it using HttpUtility.HtmlDecode (C#).

For most cases, I can parse the referer string and show hebrew characters, but there are a few cases which I cannot.

I found that the two strings are different (the one displays right and the one the doesn't)

The one that displays right contains these kind of characters: http://www.google.co.il/search?hl=iw&source=hp&q=%D7%99%D7%91%D7%95%D7%90%D7%A0%D7%99%D7%9D %D7%9C%D7%9E%D7%AA%D7%A0%D7%95%D7%AA &meta=&aq=f&oq=

The ones that doesn't display properly (unless I use Microsoft.JScript.GlobalObject.unescape) look like this: http://www.google.co.il/custom?q=%FA%EE%E9%F8 - %F6%E9%E9 %F8%EB%E1&client=pub-0385896995839253&forid=1

I can understand that the second string contains ISO-8859-1 characters, and works properly when unescaped on the server side, but there is no encoding information as part of a url

so, I cannot distinguish between these two formats. or can I? should I?

A note: when I copy & paste those urls in the browser address bar, the browser detects the first one as "Unicode(UTF-8)" and the other one as "Windows-1255"

Thanx Yaron


Solution

  • Use the encodeURIComponent function instead of the escape function.

    If you are reading the value from the Request.QueryString collection it's already decoded, so you should not use the HtmlDecode method.