I am trying to automate a process of downloading imgur files, and for this purpose I am using beautifulsoup to get the link however to be honest I am pretty lost on why this doesn't work, as according to my research it should:
soup = BeautifulSoup("http://imgur.com/ha0WYYQ")
imageUrl = soup.select('.image a')[0]['href']
The code above just returns an empty list, and therefore an error. I tried to modify it, but to no avail. Any and all input is appreciated.
<div class="post-image">
<a href="//i.imgur.com/ha0WYYQ.jpg" class="zoom">
<img src="//i.imgur.com/ha0WYYQ.jpg" alt="Frank in his bb8 costume" itemprop="contentURL">
</a>
</div>
this is the image tag, the "post-image"
is a single word, can not be separated.
imageUrl = soup.select('.post-image a')[0]['href']
shortcut for select one tag:
imageUrl = soup.select_one('.post-image a')['href']
To parse a document, pass it into the BeautifulSoup constructor. You can pass in a string or an open filehandle:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("index.html"))
soup = BeautifulSoup("<html>data</html>")