Search code examples
pythonweb-scrapingpython-requests-html

Remove specific result for requests_html


import datetime
from requests_html import HTMLSession
session = HTMLSession()
url = 'https://music.apple.com/us/playlist/top-100-hong-kong/pl.7f35cffa10b54b91aab128ccc547f6ef'
applemusic = session.get(url)

applemusic.html.render(sleep=1, scrolldown=1)

data = applemusic.html.xpath('//*[@id="scrollable-page"]/main/div/div[2]', first=True)
artist_list = data.find('span.svelte-vyyb4r')

for artist in artist_list:
  print(artist)

Hi Guys, I am a newbie learner for python. I want to do a small function that can scape the information from the Apple Music playlist. But there is a row result that I want to remove it form the the output (you can see the following output result). How can I do it? I know this maybe a simple question but I really appreciate any kind assistance.

<Element 'span' class=('svelte-vyyb4r',)>
<Element 'span' class=('svelte-vyyb4r',)>
*<Element 'span' class=('songs-list-row__badge', 'songs-list-row__badge--explicit', 'svelte-vyyb4r')>*
<Element 'span' class=('svelte-vyyb4r',)>
<Element 'span' class=('svelte-vyyb4r',)>

I have try to remove() function but it seems not working

for artist in artist_list.remove("songs-list-row__badge"):
    print(artist)

Output
ValueError: list.remove(x): x not in list

Solution

  • You can change your CSS selector to only get the span elements that have only svelte-vyyb4r as class:

    data = applemusic.html.xpath('//*[@id="scrollable-page"]/main/div/div[2]', first=True)
    artist_list = data.find('span[class="svelte-vyyb4r"]')
    

    you can also loop through your list and check if it has the class you don't want and ignore it

    for artist in artist_list.remove():
        if "songs-list-row__badge" in artist.attrs["class"]:
            # skip element
            continue
    
        print(artist)