Search code examples
pythonemailweb-scrapinghrefdata-extraction

Python scraping email protection address from href link


I want to get email adresses from : [1]: https://thenationalweddingdirectory.com.au/suppliers/wedding-venues/queensland/the-dock-mooloolaba-events/

right now I have the code, but how can i scrap the email address from the clicked link?

from requests_html import HTMLSession    

url = 'https://thenationalweddingdirectory.com.au/explore/?category=wedding-venues&region=melbourne&sort=top-rated'

s = HTMLSession()
r = s.get(url)

r.html.render(sleep=1)
products = r.html.xpath('//*[@id="finderListings"]/div[2]', first=True)

for item in products.absolute_links:
r = s.get(item)
print(r.html.find('li.lmb-calltoaction a', first=True))

Solution

  • Email, telephone is on the page, there are one json with all info you need.
    Also you have some "ajax" request to get all URLs to visit.

    import json
    from bs4 import BeautifulSoup
    import requests
    import re
    
    params = {
        'mylisting-ajax': '1',
        'action': 'get_listings',
        'form_data[page]': '0',
        'form_data[preserve_page]': 'false',
        'form_data[category]': 'wedding-venues',
        'form_data[region]': 'melbourne',
        'form_data[sort]': 'top-rated',
        'listing_type': 'place',
    }
    
    response = requests.get('https://thenationalweddingdirectory.com.au/', params=params)
    # get all urls
    results = re.findall("https://thenationalweddingdirectory.com.au/suppliers/wedding-venues/melbourne/[a-zA-Z-]*/",
                         response.text.replace("\\", ""))
    headers = {
        'accept': '*/*',
        'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8,es;q=0.7,ru;q=0.6',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
    }
    for result in results:
        print("Navigate: " + result)
        response = requests.get(result, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        scripts = soup.find_all("script")
        for script in scripts:
            if "LocalBusiness" in script.text:
                data = json.loads(script.text)
                print("Name: " + data["name"])
                print("Telephone: " + data["telephone"])
                print("Email: " + data["email"])
                break
    

    OUTPUT:

    Navigate: https://thenationalweddingdirectory.com.au/suppliers/wedding-venues/melbourne/metropolis-events/
    Name: Metropolis Events
    Telephone: 03 8537 7300
    Email: [email protected]
    Navigate: https://thenationalweddingdirectory.com.au/suppliers/wedding-venues/melbourne/cotham-dining/
    Name: Cotham Dining
    Telephone: 0411 931 818
    Email: [email protected]