Search code examples
asp.netunixcronscreen-scrapingscheduled-tasks

Automatically downloading files from a specific website


I am a very new programmer.. A website is providing a lot of zip files that i needed. It will be updated/uploaded new zip files weekly. What I need to do is write a program/script to do auto downloading from the web weekly.. for example, this is the web link http://www.google.com/googlebooks/uspto-patents-applications-yellowbook.html ( you can see a lot of zip files there )

so my question is

  1. What script i have to write(i got no experience in writing any script, so what can you suggest?) so i can download the zip file programmatically?

  2. If the 1st questioned solved, then how should i make it to download the new zip file uploaded weekly?

Is it i have to use DOM...unix? if yes, i will do some research on tat to make it work.


Solution

  • Why wget? You can use HtmlAgilityPack to parse the website and extract all links. Then you simply loop over all urls and download the file, using C# all the way through. You can also open a wget process from c# if you wish to do so.

    On the other hand, this can easily be done using bash and sed/awk and grep in combination with wget.

    Either way you will still need cron to schedule the job on a weekly basis.

    WebClient Client = new WebClient ();
    Client.DownloadFile("http://www.csharpfriends.com/Members/index.aspx", "index.aspx");