I'm building a webscraper that constantly refreshes a buch of etherscan URL's every 30 seconds and if any new transfers have happened that are not accounted for, it sends me an email notification and a link to the relevant address on etherscan so I can manually check them out.
One of the addresses that I wanted to keep tabs on is here:
What I have done so far:
from urllib.request import Request, urlopen
url = 'https://etherscan.io/token/0xd6a55c63865affd67e2fb9f284f87b7a9e5ff3bd?a=0x94f52b6520804eced0accad7ccb93c73523af089'
req = Request(url, headers={'User-Agent': 'XYZ/3.0'}) # I got this line from another post since "uClient = uReq(URL)" and "page_html = uClient.read()" would not work (I beleive that etherscan is attemption to block webscraping or something?)
response = urlopen(req, timeout=20).read()
response_close = urlopen(req, timeout=20).close()
page_soup = soup(response, "html.parser")
Transfers_info_table_1 = page_soup.find("div", {"class": "table-responsive"})
print(Transfers_info_table_1)
The interesting thing is, when I run this, I get the following output:
<div class="table-responsive" style="visibility:hidden;">
<iframe frameborder="0" id="tokentxnsiframe" scrolling="no" src="" style="width: 100px; height: 600px; min-width: 100%;"></iframe>
</div>
I was expecting to get the output for the whole table of transfers. What am I doing wrong here?
Since the table is present inside iframe
.Copy the src
value of the iframe and then using request get the content of that url.
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import pandas as pd
url = 'https://etherscan.io/token/generic-tokentxns2?m=normal&contractAddress=0xd6a55c63865affd67e2fb9f284f87b7a9e5ff3bd&a=0xd071f6e384cf271282fc37eb40456332307bb8af'
req = Request(url, headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}) # I got this line from another post since "uClient = uReq(URL)" and "page_html = uClient.read()" would not work (I beleive that etherscan is attemption to block webscraping or something?)
response = urlopen(req, timeout=20).read()
response_close = urlopen(req, timeout=20).close()
page_soup = soup(response, "html.parser")
Transfers_info_table_1 = page_soup.find("table", {"class": "table table-md-text-normal table-hover mb-4"})
df=pd.read_html(str(Transfers_info_table_1))[0]
df.to_csv("TransferTable.csv",index=False)
Generated csv.