Search code examples
pythonweb-scrapingbeautifulsoupfind

Error with find using BeautifulSoup doing webscrapping


I have the code below which is extracting information from coingecko doing webscrapping. Its been working for the past days but today got this issue.

AttributeError                            Traceback (most recent call last)
Cell In[99], line 11
      9 htmlContent = r.content
     10 soup = BeautifulSoup(htmlContent,'html.parser')
---> 11 results = soup.find('div' , {'class': 'coingecko-table'}).find('tbody').find_all('tr')
     12 print('Reading page' + str(i))
     13 for result in results:

AttributeError: 'NoneType' object has no attribute 'find'

-- This is the code I am using:

names =[] 
prices =[] 
vol_24h = []
market_Cap = []

for i in range (1 ,2): 
    url = "https://www.coingecko.com/?page=" + str(i)
    r = requests.get(url)
    htmlContent = r.content
    soup = BeautifulSoup(htmlContent,'html.parser')
    results = soup.find('div' , {'class': 'coingecko-table'}).find('tbody').find_all('tr')
    print('Reading page' + str(i))
    for result in results:
        try:
            names.append(result.find('td', {'class':'py-0 coin-name cg-sticky-col cg-sticky-third-col px-0'}).get_text().strip()) 
        except:
            names.append('n/a')
        try:
            prices.append(result.find('td', {'class':'td-price price text-right'}).get_text().strip())
        except:
            prices.append('n/a')
        try:
            vol_24h.append(result.find('td', {'class':'td-liquidity_score lit text-right col-market'}).get_text().strip())
        except:
            vol_24h.append('n/a')
        try:
            market_Cap.append(result.find('td', {'class':'td-market_cap cap col-market cap-price text-right'}).get_text().strip())
        except:
            market_Cap.append('n/a')

Would someone be able to help me? thanks!


Solution

  • I ran your code and found the reason, the site has a simple anti-crawler policy that checks for 'User-Agent'. So you only need to add "User-Agent" in the request header when requesting, python has a package "fake-useragent" that can do this more conveniently. Change your request part to something like this. By the way, you may need to install it before using it: pip install fake-useragent

    from fake_useragent import UserAgent
    ...
    ua = UserAgent()
    r = requests.get(url, headers={"User-Agent": ua.random})
    ...