Search code examples
screen-scrapinghtml-content-extractionyql

YQL scrape entire website/domain


I'm trying to scape back a set of links and content from a domain.

The Query in google would be

"site:www.newswebsite.com search_term"

I've seen some close stuff to getting this working, but I can't seem to quite get a search working across a whole website, and then filter by the search term.

Is this possible without a custom data table?


Solution

  • I got to the bottom of it in the end.

    select title,abstract,url,date from search.web(0) where query="search_term" and sites="www.website1.com,www.website2.com,www.website3.com" | sort (field='date') | reverse()
    

    This searches 3 sites, orders by date, and newest first. There is an alternate way to reverse the sort, but this seems to work for now. I think it's descending=true within the sort (field='date',descending='true')

    Very useful, even if I do say so myself.