Search code examples
python-2.7web-scrapingbeautifulsoupurllib2

How can I assign web scraping outputs to an array using python?


I would like to execute this and get all of the text from the title and href attributes. The code runs, and I do get all of the needed data, but I would like to assign the outputs to an array and when I attempt to assign this just gives me the last instance of the attributes being true in the HTML.

from bs4 import BeautifulSoup
import urllib

r = urllib.urlopen('http://www.genome.jp/kegg-bin/show_pathway?map=hsa05215&show_description=show').read()
soup = BeautifulSoup((r), "lxml")
for area in soup.find_all('area', href=True):
    print area['href']
for area in soup.find_all('area', title=True):
    print area['title']

If it helps, I'm doing this because I will create a list with the data later. I'm just beginning to learn, so extra explanations are much appreciated.


Solution

  • You need to use list comprehensions:

    links = [area['href'] for area in soup.find_all('area', href=True)]
    titles = [area['title'] for area in soup.find_all('area', title=True)]