I would like to execute this and get all of the text from the title and href attributes. The code runs, and I do get all of the needed data, but I would like to assign the outputs to an array and when I attempt to assign this just gives me the last instance of the attributes being true in the HTML.
from bs4 import BeautifulSoup
import urllib
r = urllib.urlopen('http://www.genome.jp/kegg-bin/show_pathway?map=hsa05215&show_description=show').read()
soup = BeautifulSoup((r), "lxml")
for area in soup.find_all('area', href=True):
print area['href']
for area in soup.find_all('area', title=True):
print area['title']
If it helps, I'm doing this because I will create a list with the data later. I'm just beginning to learn, so extra explanations are much appreciated.
You need to use list comprehensions:
links = [area['href'] for area in soup.find_all('area', href=True)]
titles = [area['title'] for area in soup.find_all('area', title=True)]