python-3.x web-scraping beautifulsoup web-crawler etherscan

How to build Etherscan webscraper?

I'm building a webscraper that constantly refreshes a buch of etherscan URL's every 30 seconds and if any new transfers have happened that are not accounted for, it sends me an email notification and a link to the relevant address on etherscan so I can manually check them out.

One of the addresses that I wanted to keep tabs on is here:

https://etherscan.io/token/0xd6a55c63865affd67e2fb9f284f87b7a9e5ff3bd?a=0xd071f6e384cf271282fc37eb40456332307bb8af

What I have done so far:

from urllib.request import Request, urlopen
url = 'https://etherscan.io/token/0xd6a55c63865affd67e2fb9f284f87b7a9e5ff3bd?a=0x94f52b6520804eced0accad7ccb93c73523af089'
req = Request(url, headers={'User-Agent': 'XYZ/3.0'})   # I got this line from another post since "uClient = uReq(URL)" and "page_html = uClient.read()" would not work (I beleive that etherscan is attemption to block webscraping or something?)
response = urlopen(req, timeout=20).read()
response_close = urlopen(req, timeout=20).close()
page_soup = soup(response, "html.parser")
Transfers_info_table_1 = page_soup.find("div", {"class": "table-responsive"})
print(Transfers_info_table_1)

The interesting thing is, when I run this, I get the following output:

<div class="table-responsive" style="visibility:hidden;">
<iframe frameborder="0" id="tokentxnsiframe" scrolling="no" src="" style="width: 100px; height: 600px; min-width: 100%;"></iframe>
</div>

I was expecting to get the output for the whole table of transfers. What am I doing wrong here?

Solution

Since the table is present inside iframe.Copy the src value of the iframe and then using request get the content of that url.

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import pandas as pd

url = 'https://etherscan.io/token/generic-tokentxns2?m=normal&contractAddress=0xd6a55c63865affd67e2fb9f284f87b7a9e5ff3bd&a=0xd071f6e384cf271282fc37eb40456332307bb8af'
req = Request(url, headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'})   # I got this line from another post since "uClient = uReq(URL)" and "page_html = uClient.read()" would not work (I beleive that etherscan is attemption to block webscraping or something?)
response = urlopen(req, timeout=20).read()
response_close = urlopen(req, timeout=20).close()
page_soup = soup(response, "html.parser")
Transfers_info_table_1 = page_soup.find("table", {"class": "table table-md-text-normal table-hover mb-4"})
df=pd.read_html(str(Transfers_info_table_1))[0]
df.to_csv("TransferTable.csv",index=False)

Generated csv.