I have a few simple bs questions (1-3 go together and 4-6 go together). Suppose I have the HTML with the structure as follows:
<meta property="tall"/>
<meta property="wide" content="spiral"/>
<meta name="red"/>
<meta name="tall"/>
property
?"tall"
and "wide"
?property
?"tall"
?name
and property
How can I then extract "tall"
?
What I can easily do is extract all instances of meta:
soup1.find_all("meta")
But, after that, I have to access each element of the resulting list and then I can get the things like property
and name
. But I would rather skip this step and directly get all instances of property
and name
if possible.
Finally, if I want to get the url from a website using requests.get
, and it is a website that you have to click on a button at the bottom to make it load more, and I want the extra stuff, how can I make this happen?
I'm not an expert at using BeautifulSoup but I gave it a try and here's what I came up with, which is hopefully enough to get you started. Just be aware that there might me more elegant solutions.
Boilerplate:
from bs4 import BeautifulSoup
import re
a = """<meta property="tall"/>
<meta property="wide" content="spiral"/>
<meta name="red"/>
<meta name="tall"/>"""
soup = BeautifulSoup(a)
Questions:
I.
p = soup.findAll('meta', attrs = {"property":re.compile('.*')})
>> [<meta property="tall"/>, <meta content="spiral" property="wide"/>]
II.
ex = [p[i]['property'] for i in range(len(p))]
>> ['tall', 'wide']
III. I'm not sure what you mean, maybe it's covered already?
IV.
alltall = soup.findAll('meta', attrs = {'name':'tall'})
alltall += (soup.findAll('meta', attrs = {'property':'tall'}))
>> [<meta name="tall"/>, <meta property="tall"/>]
V./VI. I spent some time searching but did not find an elegant way to do it this way around. Maybe I'm overlooking something.