Search code examples
htmltextbeautifulsoupscreen-scrapingstrsplit

How to modify get_text function of BeautifulSoup according to required formatting?


I want to scrape this webpage. I'm using BeautifulSoup.

url="https://www.blockchain.com/btc/block/00000000000000000011898368c395f1c35d56ea9109d439256d935a4fe7d656" 
page=requests.get(url)
soup=BeautifulSoup(page.text,'html.parser')
block_details=soup.find(class_="hnfgic-0 jlMXIC")
print block_details.get_text()

The output is:

Hash00000000000000000011898368c395f1c35d56ea9109d439256d935a4fe7d656Confirmations8Timestamp2019-11-21 17:52Height604806MinerSlushPoolNumber of Transactions2,003Difficulty12,973,235,968,799.78Merkle root49ee8cb431ef3e613fdc9ac3146335d1a608a0e6afb5cf9ab44c9ddc51acfbe9Version0x20000000Bits387,297,854Weight3,993,364 WUSize1,355,728 bytesNonce849,455,972Transaction Volume4560.73542334 BTCBlock Reward12.50000000 BTCFee Reward0.19346486 BTC

But i want the output as:

Hash
00000000000000000011898368c395f1c35d56ea9109d439256d935a4fe7d656
Confirmations
8
Timestamp
2019-11-21 17:52
Height
604806
.
.
.

I intend to use strsplit function with this string. So a end-line separator between two texts will help me differentiate the strings by using strsplit("\n"). Please help.

EDIT: Selenium's .text function generates my desired output, but I want a fix using BeautifulSoup.


Solution

  • You can add separator='\n' parameter to get_text() method:

    import requests
    from bs4 import BeautifulSoup
    
    url="https://www.blockchain.com/btc/block/00000000000000000011898368c395f1c35d56ea9109d439256d935a4fe7d656"
    page=requests.get(url)
    soup=BeautifulSoup(page.text,'html.parser')
    block_details=soup.find(class_="hnfgic-0 jlMXIC")
    print(block_details.get_text(separator='\n'))  # <-- note the separator parameter
    

    Prints:

    Hash
    00000000000000000011898368c395f1c35d56ea9109d439256d935a4fe7d656
    Confirmations
    13
    Timestamp
    2019-11-21 17:52
    Height
    604806
    Miner
    SlushPool
    Number of Transactions
    2,003
    Difficulty
    12,973,235,968,799.78
    Merkle root
    49ee8cb431ef3e613fdc9ac3146335d1a608a0e6afb5cf9ab44c9ddc51acfbe9
    Version
    0x20000000
    Bits
    387,297,854
    Weight
    3,993,364 WU
    Size
    1,355,728 bytes
    Nonce
    849,455,972
    Transaction Volume
    4560.73542334 BTC
    Block Reward
    12.50000000 BTC
    Fee Reward
    0.19346486 BTC