Search code examples
pythonjavaweb-scrapingbeautifulsoupscreen-scraping

Getting href urls using beautifulsoup in python


I am trying to download all csv files from the following url: https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices ,but unfortunately I could not succeed as expected. Here is my attempt:

soup = BeautifulSoup(page.content, "html.parser")
market_dataset = soup.findAll("table",{"class":"table table-striped table-condensed table-clean"})
for a in market_dataset.find_all('a', href=True):
    print("Found the URL:", a['href'])

Can anyone please help me. How can I get all urls' of the csv files.


Solution

  • Select your elements more specific e.g. with css selectors and be aware you have to concat the href with baseUrl:

    ['https://emi.ea.govt.nz'+a['href'] for a in soup.select('td.csv a')]
    

    or simply change your code and use find() instead of findAll() to locate the table, what causes the following attribute error:

    AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

    market_dataset = soup.find("table",{"class":"table table-striped table-condensed table-clean"})
    

    Note: In new code use strict find_all() instead of old syntax findAll() or a mix of both.

    Example

    from bs4 import BeautifulSoup
    import requests
    
    url = 'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices'
    
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    
    ['https://emi.ea.govt.nz'+a['href'] for a in soup.select('td.csv a')]
    

    Output

    ['https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices/20220318_FinalEnergyPrices_I.csv',
     'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices/20220317_FinalEnergyPrices_I.csv',
     'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices/20220316_FinalEnergyPrices.csv',
     'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices/20220315_FinalEnergyPrices.csv',
     'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices/20220314_FinalEnergyPrices.csv',
     'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices/20220313_FinalEnergyPrices.csv',
     'https://emi.ea.govt.nz/Wholesale/Datasets/FinalPricing/EnergyPrices/20220312_FinalEnergyPrices.csv',...]