Search code examples
pythonbeautifulsouphrefscreen-scraping

Scrape Href python


Looking to scrape citynames from a certain site. This is the relevant code i've written so far, with the text stored in a variable. However, I need to place all the citynames in a list, which does not seem to work for me. Here is the HTML:

<a id="ctl00_ContentPlaceHolder1_rptrContinents_ctl00_rptrRows_ctl00_lnkBunker" href="PortDetails.aspx?ElementID=ffd65ee0-93ea-4195-b1ba-a69c8b1908c5">Amsterdam</a>

And this is my code: names = row.find_all('th')

column_1 = col[0].string.strip()
Ifo380.append(column_1)
column_2 = col[3].string.strip()
Ifo180.append(column_2)
column_3 = col[6].string.strip()
Mdo.append(column_3)
column_4 = col[9].string.strip()
Mgo.append(column_4)

for port in names:
name= item.contents.find_all("a").string

Can anyone help?


Solution

  • You can use list comprehension :

    >>> html = '<a id="ctl00_ContentPlaceHolder1_rptrContinents_ctl00_rptrRows_ctl00_lnkBunker" href="PortDetails.aspx?ElementID=ffd65ee0-93ea-4195-b1ba-a69c8b1908c5">Amsterdam</a>'
    >>> soup = BeautifulSoup(html)
    >>> citynames = [names.text for names in soup.find_all('a')]
    ['Amsterdam']