Search code examples
c#unicodeweb-crawleridn

Can Not Read UNICODE URL in C#


The following code won't work:

using System;
using System.IO;
using System.Net;
using System.Web;

namespace Proyecto_Prueba_04
{
    class Program
    {
        /// <summary>
        /// 
        /// </summary>
        /// <param name="url"></param>
        /// <returns></returns>
        public static string GetWebText(string url)
        {
            HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);

            request.UserAgent = "A .NET Web Crawler";

            WebResponse response = request.GetResponse();

            Stream stream = response.GetResponseStream();

            StreamReader reader = new StreamReader(stream);

            string htmlText = reader.ReadToEnd();

            return htmlText;
        } // End of the GetWebText method.

        /// <summary>
        /// 
        /// </summary>
        /// <param name="args"></param>
        public static void Main(string[] args)
        {
            string urlPrueba = Uri.UnescapeDataString("http://?????????.??/");
            Console.WriteLine("urlPrueba" + " = " + urlPrueba);

            var encoded = HttpUtility.UrlPathEncode(urlPrueba);
            Console.WriteLine("encoded" + " = " + encoded);

            string codigoHTML = GetWebText(encoded);
            Console.WriteLine("codigoHTML" + " = " + codigoHTML);

            Console.ReadLine();
        } // End of the Main method.
    } // End of the Program class.
} // End of the Proyecto_Prueba_04 namespace.

I can't understand how do I have to handle a UNICODE URL.

Any ideas?

Thanks.


Solution

  • You can use IdnMapping class.

      string idn = "президент.рф";
    
      IdnMapping mapping = new IdnMapping();
      string asciiIdn = mapping.GetAscii(idn);
      Console.WriteLine(asciiIdn);    
    
      var text = GetWebText("http://" + asciiIdn);
      Console.WriteLine(text);