Search code examples
python-3.xweb-scrapinggoogle-chrome-devtoolspython-requests-html

Website Scraping: Can't find the correct URL that brings me the data I want when I use the Network tab in Chrome Devtools


I'm trying to scrape a radio station website to get the current charts (https://www.energy.de/programm/energy-euro-hot-30 and then https://music.apple.com/de/playlist/energy-euro-hot-30/pl.9b672a18307c4cd7ba1ece0106891868). I am using Python and the Requests HTML module. When I analyze the HTML code provided by the request, the elements I can analyze are not included. However, if I examine the page displayed in the browser, I find the desired data. I had a similar problem at the beginning of the week, when a user (https://stackoverflow.com/users/10035985/andrej-kesely) helped me. He used Chrome Devtools and its Network tab to find the correct link to access the desired data. I have now tried this myself for my current problem, but am totally overwhelmed by the flood of connections. Maybe someone can please nudge me in the right direction...

I've tried using Chrome Devtools and their Network tab to find the correct link to get the data I need. I was not successful.


Solution

  • You don't see anything in Network Tab because the data is stored inside <script> element in the page. Here is an example how you can parse it:

    import json
    
    import requests
    from bs4 import BeautifulSoup
    
    
    def find_tracks(o):
        if isinstance(o, dict):
            if o.get("itemKind") == "trackLockup":
                yield o["items"]
                return
            for v in o.values():
                yield from find_tracks(v)
        elif isinstance(o, list):
            for v in o:
                yield from find_tracks(v)
    
    
    url = "https://music.apple.com/de/playlist/energy-euro-hot-30/pl.9b672a18307c4cd7ba1ece0106891868"
    
    soup = BeautifulSoup(requests.get(url).content, "html.parser")
    data = json.loads(soup.select_one("#serialized-server-data").text)
    
    tracks = next(find_tracks(data))
    
    # print(json.dumps(tracks, indent=4))
    
    for track in tracks:
        print(f'{track["title"]:<55} {track["artistName"]}')
    

    Prints:

    Overdrive (feat. Norma Jean Martine)                    Ofenbach
    Houdini                                                 Dua Lipa
    Strangers                                               Kenya Grace
    When We Were Young (The Logical Song)                   David Guetta & Kim Petras
    greedy                                                  Tate McRae
    Gimme Love                                              Sia
    Lose Control                                            Teddy Swims
    Cynical                                                 twocolors, Safri Duo & Chris de Sarandy
    Lovin On Me                                             Jack Harlow
    Si No Estás                                             Iñigo Quintero
    Paint The Town Red                                      Doja Cat
    Water                                                   Tyla
    On My Love                                              Zara Larsson & David Guetta
    Is It Love                                              Loreen
    I'll Be There                                           Robin Schulz, Rita Ora & Tiago PZK
    Dreaming                                                Marshmello, P!nk & Sting
    American Town                                           Ed Sheeran
    Is It Over Now? (Taylor's Version) [From The Vault]     Taylor Swift
    Better Me                                               Michael Schulte & R3HAB
    Mwaki                                                   ZERB
    Substitution (feat. Julian Perretta)                    Purple Disco Machine & Kungs
    RUNAWAY                                                 OneRepublic
    Blindside                                               James Arthur
    Dive                                                    Lost Frequencies & Tom Gregory
    Tattoo                                                  Loreen
    LOVE'n'TENDRESSE                                        Eddy de Pretto
    Prada                                                   cassö, RAYE & D-Block Europe
    Never Give Up                                           Puggy
    Used To Be Young                                        Miley Cyrus
    Seasons                                                 Thirty Seconds to Mars