Search code examples
htmlpython-3.xbeautifulsoupurllib

Web scraping with python - Dynamic table data isn't downloaded


I want to get transaction times from a website https://explorer.flitsnode.app/address/FieXP1irJKvmWUiqV18AFdDZD8bgWvfRiC/ but when I make a request for html I don't get full site data.

I get everything except the contents of the table I need - "Transactions of address" enter image description here

I have the css selector for the table #txaddr but it returns just the top (Timestamp, Block, Hash, ..)

My code so far - I added a few comments to it.

import bs4
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

def NodeRewardTime(link):
   req = Request(link,headers={'User-Agent': 'Mozilla/5.0'})
   webpage = urlopen(req).read()
   soup = bs4.BeautifulSoup(webpage, 'html5lib')  # pip install html5lib
   all_results = soup.select("#txaddr") # CSS selector for the entire table
   try:
       [print(x.text) for x in all_results] # prints results 
   except:
       print("No data to show")

link = "https://explorer.flitsnode.app/address/FieXP1irJKvmWUiqV18AFdDZD8bgWvfRiC/"

NodeRewardTime(link)
input("End")

Output: TimestampBlockHashAmount (FLS)Balance (FLS)TX Type [End]

enter image description here


Solution

  • If we inspect the page, you see that the data is loaded in JSON format via this site.

    The following will print the data in a table format:

    from urllib.request import Request, urlopen
    from bs4 import BeautifulSoup
    import json
    
    
    def NodeRewardTime(link):
        req = Request(link, headers={"User-Agent": "Mozilla/5.0"})
        webpage = urlopen(req).read()
    
        soup = BeautifulSoup(webpage, "html5lib")
        json_data = json.loads(soup.text)
    
        return "\n".join(" | ".join(i) for i in json_data["data"])
    
    URL = "https://explorer.flitsnode.app/get_address_transactions?address=fiexp1irjkvmwuiqv18afddzd8bgwvfric"
    print(NodeRewardTime(URL))
    

    Outputs:

    2020-08-14 00:00 | 562586 | cfc5fc6e81c0f31aaac85c2e3e6e727ce00cfdf4b938e7092472ce6f549b7fbf | 3.67999999 | 1003.67999999 | MASTERNODE
    2020-08-13 16:37 | 562211 | 68f08eefef36aecd33645b13f3c95d0c3160ade5bc180b1f3b32ced670d97bef | -3.67999999 | 1000.00000000 | OUT
    2020-08-12 18:58 | 561193 | 31958481f27f3d40ef5df4f437169f169f58b7b9556cc8ea5c381d4daf6d96b2 | 3.67999999 | 1003.67999999 | MASTERNODE
    2020-08-11 22:00 | 560155 | 7ae289b8250fd94af10aa5e0a884149f548c7e3d1c6e05e7d78ac80284b3833a | -36.79999990 | 1000.00000000 | OUT
    2020-08-11 15:02 | 559828 | 618185e5f12436e4c5fc97d45d36098ca56662780bbd037abfedfa316219571e | 3.67999999 | 1036.79999990 | MASTERNODE
    2020-08-10 14:52 | 558579 | 3afeaa5e9e9130f03fac0303de680d790d075f1bbbae95e730bcf90fc33b82b9 | 3.67999999 | 1033.11999991 | MASTERNODE
    2020-08-09 12:37 | 557281 | 0943156c88cc667502aef84b8143ba89f84cc069e342c86e028cae034abf3b36 | 3.67999999 | 1029.43999992 | MASTERNODE
    2020-08-08 12:10 | 556044 | 31f56c608a02ae8f90b0e113dc60a4f35eec86b91c0be7242c4409bab2f4ece2 | 3.67999999 | 1025.75999993 | MASTERNODE
    2020-08-07 09:07 | 554717 | 3e3e73db2491dec2071088a080a86567d769a6979c0304bfc26bfa194bfa8e5f | 3.67999999 | 1022.07999994 | MASTERNODE
    2020-08-06 07:47 | 553471 | 92605aff1c7ee92302323b22ea4b2d812e71afa3e07be8a80e8a62d3f7281314 | 3.67999999 | 1018.39999995 | MASTERNODE
    2020-08-05 04:47 | 552123 | 286261dc57262a2d2e34e1e3fd8c008946d6a08cf8a00617b2b66c14af3f2a82 | 3.67999999 | 1014.71999996 | MASTERNODE
    2020-08-04 02:14 | 550794 | ccc75788a0b2c1b441fe9f2c3594c39ce9dcc90583112d795fd3666942c0014d | 3.67999999 | 1011.03999997 | MASTERNODE
    2020-08-02 22:32 | 549388 | d2587f7a8adf268b881a22cf8b441382093916a95ab1c9f2f91c8a0ce59a281b | 3.67999999 | 1007.35999998 | MASTERNODE
    2020-08-01 23:04 | 548196 | 1279fada75e56f2397288ce9eb4fcc7d04d10b15ea646189df75a117a2585707 | 3.67999999 | 1003.67999999 | MASTERNODE
    ... and on