Search code examples
pythonhtmlweb-scrapingbeautifulsoup

Web scraping with Python BeautifulSoup


I would like to collect data by web scraping with Python BeautifulSoup from the website for my data analysis project.

the data I want to collect from the website;

  1. date: 06.07.2027
  2. Stage: Berghain, Panorama Bar, Garten
  3. Timetable
  4. Artist
  5. Labels

eventually I want to transfer the data to SQL to build this sample table

sample table

I'm very stuck in the first step in web scraping.

import requests
from bs4 import BeautifulSoup

url = 'https://www.berghain.berlin/en/event/77218/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
soup.find('div', class_='running-order-set__info').find('span').contents[0]

result: 'Norman Nodge'

I was only managed to get first artist name..😢 and don't know how to collect other informations!

any kind soul can help poor student trying to do some fun project?

** UPDATE I was able to solve it with @Andrej Kesely answer.

I adjust a bit to fit into dataframe.

dates = []
names = []
stages = []
times = []
artists = []
notes = []

for li in soup.select('main li'):
    date = li.find_previous('p')
    dates.append(date.get_text(strip=True, separator=' ').split()[1])

    name = li.find_previous('h1')
    names.append(name.get_text(strip=True))

    stage = li.find_previous('h2')
    stages.append(stage.get_text(strip=True))

    time = li.time.text
    times.append(time)

    artist = li.select_one('.running-order-set__info span')
    artists.append(artist.contents[0] if artist.contents else 'NaN')

    notes.append(artist.span.text if artist.span else 'NaN')

bh_df = pd.DataFrame(
{'date': dates,
'party_name': names,
'stage': stages,
'start_time': times,
'artist_name': artists,
'note': notes
})

result: bh_Df


Solution

  • You can try:

    import requests
    from bs4 import BeautifulSoup
    
    url = "https://www.berghain.berlin/en/event/77218/"
    
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    
    for li in soup.select("main li"):
        date = li.find_previous("p")
        name = li.find_previous("h1")
        stage = li.find_previous("h2")
        time = li.time.text
        artist = li.select_one(".running-order-set__info span")
    
        print("Date  ", date.get_text(strip=True, separator=" ").split()[1])
        print("Name  ", name.get_text(strip=True))
        print("Stage ", stage.get_text(strip=True))
        print("Time  ", time)
        print("Artist", artist.contents[0] if artist.contents else "-")
        print("Note  ", artist.span.text if artist.span else "-")
    
        print("-" * 80)
    

    Prints:

    Date   06.07.2024                           
    Name   Klubnacht                                                                                         
    Stage  Berghain  
    Time   23:59    
    Artist Norman Nodge 
    Note   Ostgut Ton
    --------------------------------------------------------------------------------
    Date   06.07.2024      
    Name   Klubnacht                                                                                         
    Stage  Berghain  
    Time   04:30    
    Artist Alienata    
    Note   -    
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht                                                                                         
    Stage  Berghain  
    Time   08:30    
    Artist UVB         
    Note   Mord 
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht                                                                                         
    Stage  Berghain  
    Time   12:30    
    Artist Matthew Cha 
    Note   -    
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht                                                                                         
    Stage  Berghain  
    Time   16:30    
    Artist Gaetano Parisio
    Note   -    
    --------------------------------------------------------------------------------
    Date   06.07.2024         
    Name   Klubnacht                                                                                         
    Stage  Berghain  
    Time   20:30    
    Artist Justine Perry
    Note   -    
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht                                                                                         
    Stage  Berghain  
    Time   00:30    
    Artist DJ Nobu 
    Note   Bitta
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht                                                                                         
    Stage  Panorama Bar                            
    Time   23:59
    Artist Lauer 
    Note   Live at Robert Johnson / Running Back
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht
    Stage  Panorama Bar
    Time   04:00
    Artist Dam Swindle 
    Note   Heist Recordings
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht
    Stage  Panorama Bar
    Time   08:00
    Artist Kikelomo
    Note   -
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht
    Stage  Panorama Bar
    Time   12:30
    Artist -
    Note   -
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht
    Stage  Panorama Bar
    Time   19:30
    Artist Wallace
    Note   -
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht
    Stage  Panorama Bar
    Time   00:00
    Artist Cinthie 
    Note   803 Crystal Grooves
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht
    Stage  Garten
    Time   12:00
    Artist Suze Ijó
    Note   -
    --------------------------------------------------------------------------------
    Date   06.07.2024
    Name   Klubnacht
    Stage  Garten
    Time   16:00
    Artist Hiroko Yamamura
    Note   -
    --------------------------------------------------------------------------------