i am currently ironing out a very very easy parser that goes from a to z on a memberlist :: we have a memberlist here:
see: https://vvonet.vvo.at/vvonet_mitgliederverzeichnisneu
note: we have to open the link "kontaktinformationen" and scrape the data there to a pandas df
Well i think that i can do this with python beautifulsoup request and either print it to screen or store it in a df.
first of all, the script should fetch the member list page, extracts the links to individual member pages, visits each member's "kontaktinformationen" page, and subsequently it should extract the contact information. Finally, i think it is best to store the contact information in a DataFrame. Well - i finally am able to print the DataFrame to the screen or save it to a CSV file.
here is my attempt:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# first, we send a GET request to the member list page
url = "https://vvonet.vvo.at/vvonet_mitgliederverzeichnisneu"
response = requests.get(url)
# here a check if the request was successful
if response.status_code == 200:
# Parse the HTML content of the page
soup = BeautifulSoup(response.content, "html.parser")
# Find now all member links
member_links = soup.find_all("a", class_="font1")
# now - Initialize lists to store data
member_data = []
# Iterate over member links
for member_link in member_links:
# Get the URL of the "kontaktinformationen" page
member_url = "https://vvonet.vvo.at" + member_link["href"] + "/kontaktinformationen"
# Send a GET request to the member's "kontaktinformationen" page
member_response = requests.get(member_url)
# Check if the request was successful
if member_response.status_code == 200:
# Parse the HTML content of the page
member_soup = BeautifulSoup(member_response.content, "html.parser")
# Find the contact information section
contact_info_div = member_soup.find("div", class_="contact")
# Check if contact information section exists
if contact_info_div:
# Extract the contact information
contact_info_text = contact_info_div.get_text(separator="\n", strip=True)
member_data.append(contact_info_text)
else:
member_data.append("Contact information not found")
else:
member_data.append(f"Failed to retrieve contact information for {member_link.text.strip()}")
# Create a DataFrame
df = pd.DataFrame(member_data, columns=["Contact Information"])
# Display the DataFrame
print(df)
# Alternatively, you can save the DataFrame to a CSV file
# df.to_csv("member_contact_information.csv", index=False)
else:
print("Failed to retrieve the member list page.")
But at the moment i get a empty dataframe..
Empty DataFrame
Columns: [Contact Information]
Index: []
The data you see on the page is loaded from different URL (in XML format):
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = "https://vvonet.vvo.at/vvo/vvonet_website.nsf/allMitglieder?ReadViewEntries="
soup = BeautifulSoup(requests.get(url).content, "xml")
data = []
for e in soup.select("viewentry"):
t = {}
for d in e.select("entrydata"):
t[d["name"]] = d.get_text(strip=True, separator=" ")
data.append(t)
df = pd.DataFrame(data)
print(df)
Prints:
docTitle globUNID docBundesland docFachbereich docFax docInternet docMail docOrt docStrasse docTelefon docUnternehmen
0 Acredia Versicherung AG x095DA0F5F9E395B0C1258A4A0037FD66 Wien Schadenversicherer http://www.acredia.at office@acredia.at 1010 Wien Himmelpfortgasse 29 +43/(0)5 01 02-0 Acredia Versicherung AG
1 AIG Europe S.A. - Direktion für Österreich xFB78910F9D0D7FC7C1258A4A0037FCED Wien +43/1/533 25 00-80 http://www.aig.co.at info.oesterreich@aig.com 1010 Wien Herrengasse 1 - 3 +43/1/533 25 00 AIG Europe S.A.\nDirektion für Österreich
2 Allianz Care x7B60F7881DF6129AC1258A4A0037FD61 Außerordentliches Mitglied http://www.allianz-care.com IRL-Dublin 12 15 Joyce Way, Park West Business +44/7825/510 814 Allianz Care
3 Allianz Commercial xA362FDC72B76421DC1258A4A0037FD43 Wien +43/(0)59009-402 14 http://www.commercial.allianz.com stefanie.thiem@allianz.at 1100 Wien Wiedner Gürtel 9-13 +43/(0)59009-88700 Allianz Commercial
4 Allianz Elementar Lebensversicherungs-Aktiengesellschaft xD41EC8AF73F7ED93C1258A4A0037FCE5 Wien +43/(0)5 9009-70700 http://www.allianz.at feedback@allianz.at, schaden@allianz.at 1100 Wien Wiedner Gürtel 9-13 +43/(0)5 9009-0 Allianz Elementar Lebensversicherungs-Aktiengesellschaft
5 Allianz Elementar Versicherungs-Aktiengesellschaft x95C137D749F4EB97C1258A4A0037FCE6 Wien Kfz-Versicherer Krankenversicherer Schadenversicherer Unfallversicherer +43/(0)5 9009-70000 http://www.allianz.at feedback@allianz.at, schaden@allianz.at 1100 Wien Wiedner Gürtel 9-13 +43/(0)5 9009-0 Allianz Elementar Versicherungs-Aktiengesellschaft
6 APK Versicherung AG x7C2E8BD3E6C46C0BC1258A4A0037FD34 Wien Lebensversicherer +43/(0)50 275-3709 http://www.apk-versicherung.at versicherung@apk.at 1030 Wien Thomas-Klestil-Platz 13 +43/(0)50 275-3700 APK Versicherung AG
7 ARAG SE - Direktion für Österreich xFA932E61C5D638E3C1258A4A0037FD41 Wien +43/1/531 02-1923 http://www.arag.at info@arag.at 1041 Wien Favoritenstraße 36, Postfach 182 +43/1/531 02-0 ARAG SE \nDirektion für Österreich
8 Atradius Kreditversicherung - Zweigniederlassung der Atradius Crédito y Caución S.A. de Seguros y Reaseguros xAA2CB3183BE03937C1258A4A0037FD4B Wien http://www.atradius.at versicherung.kredit@atradius.com 1220 Wien Vienna DC Tower 1, Donau-City-Straße 7 +43/1/813 0313 Atradius Kreditversicherung\nZweigniederlassung der Atradius Crédito y Caución S.A. de Seguros y Reaseguros
9 Atzbacher Versicherung V.a.G. xF489570D29A92438C1258A4A0037FD68 Oberösterreich Sachversicherungsverein +43/7673/75488-20 http://www.atzbacher-versicherung.at info@atzbacher-versicherung.at 4690 Oberndorf bei Schwanenstadt Atzbacher Straße 23 +43/7673/75488-0 Atzbacher Versicherung V.a.G.
10 AWP P&C S.A., Niederlassung für Österreich x3173A1D948F040D7C1258A4A0037FCEA Wien Unfallversicherer http://www.allianz-partners.com service.at@allianz.com 1130 Wien Hietzinger Kai 101-105 +43/1/525 03-6945 (Service Center) AWP P&C S.A., Niederlassung für Österreich (Allianz Partners)
...