Search code examples
pythonlistpermalinks

How can you extract information from links which are stored in a list?


I want to get inside this list and get certain information (name, address, number, mail from the certain company) behind the links in this list:

['https://allianz-entwicklung-klima.de/kompensationspartner/aera-group/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/atmosfair-ggmbh/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/bischoff-ditze-energy-gmbh-co-kg/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/climate-extender-gmbh/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/climatepartner-gmbh/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/die-klimamanufaktur-gmbh/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/die-ofenmacher-e-v/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/first-climate/',
 'https://allianz-entwicklung-klima.de/kompensationspartner/fokus-zukunft-gmbh-co-kg/']

All the information should be stored in a table in the end. I tried a for loop but it doesn't work for me, because I only get the first link to work but not the other ones.

I'm grateful for any help


Solution

  • You could use a Python library called requests and BeautifulSoup for scraping these sites. I have written small code below, I have not had time to test it. But it should work. You have to extract the information with beautiful soup that you need and store it perhaps in a list of dictionaries like:

    data = [{"name": "", "address": "", "number": "", "mail": ""}]

    import requests
    from bs4 import BeautifulSoup
    
    links = ['https://allianz-entwicklung-klima.de/kompensationspartner/aera-group/',
            'https://allianz-entwicklung-klima.de/kompensationspartner/atmosfair-ggmbh/',
            'https://allianz-entwicklung-klima.de/kompensationspartner/bischoff-ditze-energy-gmbh-co-kg/',
            'https://allianz-entwicklung-klima.de/kompensationspartner/climate-extender-gmbh/',
            'https://allianz-entwicklung-klima.de/kompensationspartner/climatepartner-gmbh/',
            'https://allianz-entwicklung-klima.de/kompensationspartner/die-klimamanufaktur-gmbh/',
            'https://allianz-entwicklung-klima.de/kompensationspartner/die-ofenmacher-e-v/',
            'https://allianz-entwicklung-klima.de/kompensationspartner/first-climate/',
            'https://allianz-entwicklung-klima.de/kompensationspartner/fokus-zukunft-gmbh-co-kg/']
    
    for link in links:
        page = requests.get(link)
        soup = BeautifulSoup(page.content, "html.parser")
    

    To learn how to extract and use Beautiful Soup I would suggest to read this: Beautiful Soup: Build a Web Scraper With Python