Search code examples
rubyweb-scrapinganemone

anemone ignore url links including a certain phrase


I am running a web scraper with anemone on ruby and I am giving my server some problems when it visits pages that require a logon.

The pages all have a phrase, say, "account" in the url and I want the program to completely ignore and not go to any link with a destination containing this string.

How can I do this?


Solution

  • Anemone has a skip_links_like method:

    skip_links_like(*patterns)
    Add one ore more Regex patterns for URLs which should not be followed

    So adding something like

    skip_links_like /\/account\//
    

    should take care of it:

    Anemone.crawl("somesite.co.uk", :depth_limit => 1) do |anemone|
        anemone.skip_links_like /\/account\//
        #...
    end