Search code examples
python-3.xbeautifulsouphref

Get href within a table


Sorry, has most likely been asked before but I can't seem to find an answer on stack/from search engine.

I'm trying to scrape some data from a table, but there are href links which I need to get. Html as follows:

<table class="featprop results">
<tr>
**1)**<td class="propname" colspan="2"><a href="/lettings-search-results?task=View&amp;itemid=136" rel="nofollow"> West Drayton</a></td>
</tr>
<tr><td class="propimg" colspan="2">

    <div class="imgcrop">
    **2)**<a href="/lettings-search-results?task=View&amp;itemid=136" rel="nofollow"><img src="content/images/1/1/641/w296/858.jpg" alt=" Ashford" width="148"/></a>


    <div class="let">&nbsp;</div>
    </div>
</td></tr>

<tr><td class="proprooms">

So far I have used the following:

for table in soup.findAll('table', {'class': 'featprop results'}):
    for tr in table.findAll('tr'):
        for a in tr.findAll('a'):
            print(a)

Which returns both 1 and 2 in the above html, could anyone help me strip out just the href link?


Solution

  • for table in soup.findAll('table', {'class': 'featprop results'}):
        for tr in table.findAll('tr'):
            for a in tr.findAll('a'):
                print(a['href'])
    

    out:

    /lettings-search-results?task=View&itemid=136
    /lettings-search-results?task=View&itemid=136
    

    Attributes

    EDIT:

    links = set() # set will remove the dupilcate
    for a in tr.findAll('a', href=re.compile(r'^/lettings-search-results?')):
        links.add(a['href'])
    

    regular expression