Search code examples
pythonseleniumselenium-webdriverweb-scrapingchrome-web-driver

How to scrape tables after on click button for multiple pages listed in a CSV file? Selenium, Python


I would like to scrape all the information in a table presented multiple URLs using the pd.read_html function. An example of a website would be: https://www.top40.nl/10cc/10cc-donna-5867 which I import through a csv file.

After entering the website and clicking on the tab 'Songinfo' the table with all relevant information becomes visible. Please find my code below. Python gives the error: No table found and/ or cannot parse from list. Happy to hear any advice on how to correct my code:

df_list = []

with open(r"C:\Users\nlvijn02\Documents\Personal documents\Sony\Test_input_links.csv") as file:    
    reader = csv.reader(file)
    for row in reader:
        print(row[0])
        driver.get(row[0])
                
        driver.find_element_by_xpath("//a[@href='#songinfo']").click()
        
        table = driver.find_elements_by_xpath("""//*[@id="songinfo"]/table""")
    
        df_list.append(pd.read_html(table))
            
    df = pd.concat(df_list)
        
driver.close()        
df.to_csv("details.csv")

Please find below the HTML code of the table:

<div id="songinfo" class="tab-pane active" aria-expanded="true"><h2>Songinformatie</h2><table class="table-songinfo"><tbody><tr><th>Artiest</th><td><a data-linktype="artist" href="https://www.top40.nl/top40-artiesten/10cc">10cc</a></td></tr><tr><th>&nbsp;</th><th style="text-align: left;">A-kant</th></tr><tr><th>Titel</th><td>
                                                                                                            Donna                                                                                                   </td></tr><tr><th>Lengte</th><td>
                                                                                                            02:55
                                                                                                    </td></tr><tr><th>Componist(en)</th><td>
                                                                                                            Kevin Godley, Lol Creme
                                                                                                    </td></tr><tr><th>&nbsp;</th><th style="text-align: left;">B-kant</th></tr><tr><th>Titel</th><td>
                                                                                                            Hot Sun Rock
                                                                                                    </td></tr><tr><th>Lengte</th><td>
                                                                                                            03:00
                                                                                                    </td></tr><tr><th>Componist(en)</th><td>
                                                                                                            Eric Stewart, Graham Gouldman
                                                                                                    </td></tr><tr><th colspan="2">&nbsp;</th></tr><tr><th>Platenlabel</th><td>
                                                                                                    UK
                                                                                            </td></tr><tr><th>Catalogusnr</th><td>
                                                                                                    UK 6
                                                                                            </td></tr><tr><th>Hoogste positie UK</th><td>
                                                                                                    2
                                                                                            </td></tr></tbody></table></div>

Solution

  • df_list = []
    
    with open(r"C:\Users\nlvijn02\Documents\Personal documents\Sony\Test_input_links.csv") as file:
        reader = csv.reader(file)
        for row in reader:
            print(row[0])
            driver.get(row[0])
    
            driver.find_element_by_xpath("//a[@href='#songinfo']").click()
    
            table = driver.find_element_by_xpath("""//*[@id="songinfo"]/table""")
    
            df_list.append(pd.read_html(table.get_attribute('outerHTML')))
    
        df = pd.concat(df_list)
    
    driver.close()
    df.to_csv("details.csv")
    

    I modified 2 lines in your code.

    1. find_elements_by_xpath => find_element_by_xpath
    2. table => table.get_attribute('outerHTML')

    I'd be very happy if you test my code and let me know the result. Best Regards