Search code examples
pythonanchorbeautifulsoupscraper

How to extract the text between some anchor tags?


I need to extract the name of the artists from an HTML page. Here's a snippet of the page:

 </td>
 <td class="playbuttonCell">
   <a class="playbutton preview-track" href="/music/example" data-analytics-redirect="false"  >
      <img class="transparent_png play_icon" width="13" height="13" alt="Play" src="http://cdn.last.fm/flatness/preview/play_indicator.png" style="" />
    </a>
  </td>
  <td class="subjectCell" title="example, played 3 times">
    <div>
      <a href="/music/example-artist"   >Example artist name</a>

I've tried this but isn't doing the job.

import urllib
from bs4 import BeautifulSoup

html = urllib.urlopen('http://www.last.fm/user/Jehl/charts?rangetype=overall&subtype=artists').read()
soup = BeautifulSoup(html)
print soup('a')

for link in soup('a'):
    print html

Where am I screwing up?


Solution

  • for link in soup.select('td.subjectCell a'):
        print link.text
    

    It selects (just like CSS) the a elements inside td elements that have the subjectCell class.