Search code examples
arraylistweb-scrapinghtml-tablehyperlinkcell

How to get value of a cell in html page when click to a link in list link?


I have a list about 5000 link. Ex 2 in 5000 link:

https://racevietnam.com/runner/buiducninh/ecopark-marathon-2019

https://racevietnam.com/runner/drtungnguyen83/ecopark-marathon-2019

...

I want to get value of column Time of Day and row Finish of links.

Ex:

09:51:07 AM - https://racevietnam.com/runner/buiducninh/ecopark-marathon-2019

07:50:55 AM - https://racevietnam.com/runner/ngocsondknb/ecopark-marathon-2019

I got user infor of a website, that website has id, class. But table in https://racevietnam.com/runner/ngocsondknb/ecopark-marathon-2019 have not id, class in table. So I can't.

#!/usr/bin/python
from urllib.request import urlopen
from bs4 import BeautifulSoup

list_user = []

for userID in range(1, 100000):
    link = "https://example.com/member.php?u=" + str(userID)
    html = urlopen(link)
    bsObj = BeautifulSoup(html, "lxml")
    user_name = bsObj.find("div", {"id":"main_userinfo"}).h1.get_text()
    list_user.append(user_name)
    print("username", userID, "is: ", user_name)
    with open("result.txt", "a") as myfile:
        myfile.write(user_name)

Please help me.

Thank you.


Solution

  • This is my code. It's working Ok.

    import requests
    from bs4 import BeautifulSoup
    
    f = open("input.ecopark","r")
    f_content = f.readlines()
    f.close()
    
    for url in f_content:
        r = requests.get(url.rstrip())
        soup = BeautifulSoup(r.text, 'html.parser')
        result = soup.select("table tbody tr td")
        x = ""
        for i in result:
            if not x:
                if i.get_text() == "Finish":
                    x = 1
                    continue
            if x:
                print(url.rstrip()+ " "+i.get_text())
                break