Search code examples
pythonbeautifulsoupcloudflare

Can't parse coin gecko page from today with BeautifulSoup because of Cloudflare


from bs4 import BeautifulSoup as bs
import requests
import re
import cloudscraper

def get_btc_price(br):
  data=requests.get('https://www.coingecko.com/en/coins/bitcoin')

  soup = bs(data.text, 'html.parser')

  price1=soup.find('table',{'class':'table b-b'})
  fclas=price1.find('td')

  spans=fclas.find('span')

  price2=spans.text
  price=(price2).strip()
  x=float(price[1:])    
  y=x*br
  z=round(y,2)
  print(z)

  return z

This has been working for months and this morning it decided to stop. Messages that I'm getting are like: checking your browser before you can continue...., check your antivirus or consult with manager to get access... and some cloudflare gibberish.

I tried

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
print(scraper.get("https://www.coingecko.com/en/coins/bitcoin").text)

and it still blocks me access. What should I do? Is there any other way to bypass this or am I doing something wrong.


Solution

  • It doesn't seem a problem from the scraper but with the server when dealing the negotiation for the connection.

    Add a user agent otherwise the requestsuse the deafult

    user_agent = #
    response = requests.get(url, headers={ "user-agent": user_agent})
    

    Check the "requirements"

    url = #
    response = requests.get(url)
    for key, value in response.headers.items():
      print(key, ":", value)