Search code examples
pythonbeautifulsouphtml-tablecryptocurrencyinspect

Why is my program not printing data from the table I am scraping?


I am currently struggling to create a program that scrapes data from the table on https://coinmarketcap.com. I see that I am in a bit over my head. However, I am trying to learn how it all works to be able to do it on my own. So far, my program prints a cryptocurrencies rank, name, and ticker symbol. Now, I am working to scrape the dynamically changing price from the table. Here is my code:

import requests
from bs4 import BeautifulSoup

url = "https://coinmarketcap.com"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
rank = 1

for td in soup.select("td:nth-of-type(3)"):
    t = " ".join(tag.text for tag in td.select("p, span")).strip()
    print(rank, "|", end =" "); print("{:<30} {:<10}".format(*t.rsplit(maxsplit=1)))
    rank = rank + 1

    for td in soup.select("td:nth-of-type(4)"):
        t = " ".join(tag.text for tag in td.select("a")).strip()

    print("{}_1d".format(t.rsplit(maxsplit=1)))

this prints as follows:

1 | Bitcoin                        BTC
[]_1d
2 | Ethereum                       ETH
[]_1d
3 | Tether                         USDT
[]_1d
4 | Binance Coin                   BNB
[]_1d


and so on...

How can I have it print the current price of the crypto and not just literal text? I can figure out the formatting on my own, just need help displaying the actual data. Any help is greatly appreciated. And if you can explain your solution, that would be even more helpful.


Solution

  • I found the following issues in your code:

    • Your line print("{}_1d".format(t.rsplit(maxsplit=1))), is outside the inner for loop, this makes only the last value of t to be printed (which is empty). So, correcting this to put it inside the loop along with a change to not print every t value is what is required.
    • You have put the price loop (td:nth-of-type(4)) inside the td:nth-of-type(3) loop. This makes the entire price loop run repeatedly everytime the outer loop runs once
    • If you print the value of td in the loop with (td:nth-of-type(4)), you will find that the tag is not present in your response after the first ~10 results. Using td.text will get you the required result.

    I have slightly modified your code to fix some of the issues:

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://coinmarketcap.com"
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    rank = 1
    
    t1, t2 = [], []
    
    for td in soup.select("td:nth-of-type(3)"):
        t1.append(" ".join(tag.text for tag in td.select("p, span")).strip())
    
    for td in soup.select("td:nth-of-type(4)"):
        t2.append(td.text)
    
    for i in range(0, len(t1)):
        rank = rank + 1
        print(rank, "|", end =" "); print("{:<30} {:<10}".format(*t1[i].rsplit(maxsplit=1)))
        print("{}_1d".format(t2[i]))