I would like to collect data by web scraping with Python BeautifulSoup from the website for my data analysis project.
the data I want to collect from the website;
eventually I want to transfer the data to SQL to build this sample table
I'm very stuck in the first step in web scraping.
import requests
from bs4 import BeautifulSoup
url = 'https://www.berghain.berlin/en/event/77218/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
soup.find('div', class_='running-order-set__info').find('span').contents[0]
result: 'Norman Nodge'
I was only managed to get first artist name..😢 and don't know how to collect other informations!
any kind soul can help poor student trying to do some fun project?
** UPDATE I was able to solve it with @Andrej Kesely answer.
I adjust a bit to fit into dataframe.
dates = []
names = []
stages = []
times = []
artists = []
notes = []
for li in soup.select('main li'):
date = li.find_previous('p')
dates.append(date.get_text(strip=True, separator=' ').split()[1])
name = li.find_previous('h1')
names.append(name.get_text(strip=True))
stage = li.find_previous('h2')
stages.append(stage.get_text(strip=True))
time = li.time.text
times.append(time)
artist = li.select_one('.running-order-set__info span')
artists.append(artist.contents[0] if artist.contents else 'NaN')
notes.append(artist.span.text if artist.span else 'NaN')
bh_df = pd.DataFrame(
{'date': dates,
'party_name': names,
'stage': stages,
'start_time': times,
'artist_name': artists,
'note': notes
})
You can try:
import requests
from bs4 import BeautifulSoup
url = "https://www.berghain.berlin/en/event/77218/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for li in soup.select("main li"):
date = li.find_previous("p")
name = li.find_previous("h1")
stage = li.find_previous("h2")
time = li.time.text
artist = li.select_one(".running-order-set__info span")
print("Date ", date.get_text(strip=True, separator=" ").split()[1])
print("Name ", name.get_text(strip=True))
print("Stage ", stage.get_text(strip=True))
print("Time ", time)
print("Artist", artist.contents[0] if artist.contents else "-")
print("Note ", artist.span.text if artist.span else "-")
print("-" * 80)
Prints:
Date 06.07.2024
Name Klubnacht
Stage Berghain
Time 23:59
Artist Norman Nodge
Note Ostgut Ton
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Berghain
Time 04:30
Artist Alienata
Note -
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Berghain
Time 08:30
Artist UVB
Note Mord
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Berghain
Time 12:30
Artist Matthew Cha
Note -
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Berghain
Time 16:30
Artist Gaetano Parisio
Note -
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Berghain
Time 20:30
Artist Justine Perry
Note -
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Berghain
Time 00:30
Artist DJ Nobu
Note Bitta
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Panorama Bar
Time 23:59
Artist Lauer
Note Live at Robert Johnson / Running Back
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Panorama Bar
Time 04:00
Artist Dam Swindle
Note Heist Recordings
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Panorama Bar
Time 08:00
Artist Kikelomo
Note -
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Panorama Bar
Time 12:30
Artist -
Note -
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Panorama Bar
Time 19:30
Artist Wallace
Note -
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Panorama Bar
Time 00:00
Artist Cinthie
Note 803 Crystal Grooves
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Garten
Time 12:00
Artist Suze Ijó
Note -
--------------------------------------------------------------------------------
Date 06.07.2024
Name Klubnacht
Stage Garten
Time 16:00
Artist Hiroko Yamamura
Note -
--------------------------------------------------------------------------------