I'm writing a plugin for xbmc in python. I have got a list of strings in the format:
<a href="/www.link.to/something">name of link</a>
By using beautiful stone soup (the relevant part of the code):
soup = BeautifulStoneSoup(link, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
programs = soup('ul')
i = 0
for prog in programs:
i = i+1
if i==(5+getLetterValue(name)):
j = 0
while j < len(prog('li')):
li = prog('li')[j]
link = li('a')[0]
getLeterValue
is a function that returns an index which indidcates where this specific 'ul' tag is placed (according to the desired letter).
Now I want to split link in the link and text. I tried using re.compile:
match=re.compile('<a href="(.+?)">(.+?)</a>').findall(link.string)
but all I get is match=[]
What have I done wrong?
Note: I know I should regexp html code but I'm not sure this ``rule'' is valid for small string. Also, for some reason this is almost a standard in xbmc plugin writing and I assume there is some reason for that.
Why not let BeautifulSoup give you the href attribute and the element contents?