Search code examples
javascriptpythonbeautifulsoupgisgoogle-earth

Using Python to pull files/(kml's) from multiple URLs


I'm a beginner with python and was trying to automate a process which involves going to a site :http://www.wildcad.net/WildCADWeb.asp and clicking every single dispatch center, for example: http://www.wildcad.net/WCAZ-ADC.htm , from there loading a kml. I noticed that each page follows a similar format so I thought I could use a select all function. I wrote my code as...

from bs4 import BeautifulSoup
import requests

urls = ('http://www.wildcad.net/WCAZ-ADC.htm', 'http://www.wildcad.net/WCALAIC.htm',
       'http://www.wildcad.net/WCAR-AOC.htm','http://www.wildcad.net/WCAZ-ADC.htm'
       'http://www.wildcad.net/WCAZ-FDC.htm', 'http://www.wildcad.net/WCAZ-PDC.htm'
       'http://www.wildcad.net/WCAZ-PHC.htm', 'http://www.wildcad.net/WCAZ-SDC.htm'
       'http://www.wildcad.net/WCAZ-TDC.htm', 'http://www.wildcad.net/WCAZ-WDC.htm')


 result = requests.get(urls)
    doc = BeautifulSoup(result.text, 'html.parser')
    print(doc.prettify())
    for i in enumerate(soup.findAll('a')):
        _KML = urls + link.get('href')
        if _KML.endswith('.kml'):
            urls.append(_KML)

    open(_KML)

However it doesn't seem to pull the files and I keep getting an error message on line '65' Any direction or example of how to remedy this will be very much appreciated!


Solution

  • Working code. Please just run the code.

    import time
    from selenium import webdriver
    from webdriver_manager.chrome import ChromeDriverManager
    from bs4 import BeautifulSoup
    
    url = 'http://www.wildcad.net/WildCADWeb.asp'
    
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.maximize_window()
    time.sleep(2)
    driver.get(url)
    time.sleep(5)
    
    soup = BeautifulSoup(driver.page_source, "html.parser")
    
    for link in soup.select('table[align="center"] tbody tr td a')[1:]:
        url=link.get('href')
        #print(url)
        if url.endswith('.kml'):
            kml_url = url
            print(kml_url)
    
       
    

    Output:

    http://www.wildcad.net/WAearth.kml