Search code examples
c#multithreadingasync-awaittaskhttpwebrequest

What is the fastest way to download web pages'es sources?


I need to download numerous web pages' sources. So I need to do that as fast as possible. Here is my codes.

  private static async Task<string> downloadsource(string link)
  {                
     ServicePointManager.Expect100Continue = false;
     WebRequest req = WebRequest.Create(link);
     req.Proxy = null;
     req.Method = "GET";
     WebResponse res = await siteyeBaglantiTalebi.GetResponseAsync();
     StreamReader read = new StreamReader(res.GetResponseStream());
     return read.ReadToEnd();           
  }

  List<string> links = new List<string>(){... including some web page links};

  private static List<string> source_list(List<string> links)
  {
      List<string> sources = new List<string>();

      for (int i = 0; i < links.Count; i++)
      {
          Task<string> _task = downloadsource(links[i]);
          Console.WriteLine("Downloaded : " + i);
          sources.Add(_task.Result);
      }            

          return sources;
  }

I was wondering if this code is the fastest way or it can be enhanced. Can u pls help me with that ?


Solution

  • You are performing a _task.Result call inside each loop. Your code will run as fast as if you downloaded each page one after another if you code it like that.

    Try this instead:

    private async static Task<List<string>> source_list(List<string> links)
    {
        List<Task<string>> sources = new List<Task<string>>();
    
        for (int i = 0; i < links.Count; i++)
        {
            Task<string> _task = downloadsource(links[i]);
            Console.WriteLine("Downloading : " + i);
            sources.Add(_task);
        }
    
        return (await Task.WhenAll(sources)).ToList();
    }
    

    This would be even better:

    private async static Task<string[]> source_list(List<string> links)
    {
        return await Task.WhenAll(links.Select(l => downloadsource(l)));
    }
    

    Also I cleaned up your downloadsource method:

    private static async Task<string> downloadsource(string link)
    {
        ServicePointManager.Expect100Continue = false;
        WebRequest req = WebRequest.Create(link);
        req.Proxy = null;
        req.Method = "GET";
        using (WebResponse res = await req.GetResponseAsync())
        {
            using (StreamReader read = new StreamReader(res.GetResponseStream()))
            {
                return read.ReadToEnd();
            }
        }
    }