Search code examples
scrapyrobots.txtyahoo-finance

Does yahoo finance ban web scrapy or not?


The robots.txt in yahoo robots.txt say:

User-agent: *
Sitemap: https://finance.yahoo.com/sitemap_en-us_desktop_index.xml
Sitemap: https://finance.yahoo.com/sitemaps/finance-sitemap_index_US_en-US.xml.gz
Disallow: /r/
Disallow: /__rapidworker-1.2.js
Disallow: /__blank
Disallow: /_td_api
Disallow: /_remote

Does yahoo finance ban web scrapy or not?
What was disallowed by yahoo finance website?
What we can infer from yahoo's robots.txt file?


Solution

  • Nothing in the robots.txt file expressly prevents you from scraping Yahoo Finance, however Yahoo finance is governed by Yahoo's Terms of Service.

    The most pertinent part of this document says basically that you should not do anything which would interfere with their services. Realistically, this means that if you are planning on scraping Yahoo Finance for data, you should do so responsibly (not many thousands of requests, as this will quickly get you banned).

    That said, web scraping is generally inefficient (as you are reloading an entire HTML page just to collect data programmatically). I would look into using an API instead (like those discussed here), as this will be a) more reliable b) faster and c) definitely be legal.