I am trying to extract links and titles for these links in an anime website, However, I am only able to extract the whole tag, I just want the href and the title.
Here`s the code am using:
import requests
from bs4 import BeautifulSoup
r = requests.get('http://animeonline.vip/info/phi-brain-kami-puzzle-3')
soup = BeautifulSoup(r.content, "html.parser")
for link in soup.find_all('div', class_='list_episode'):
href = link.get('href')
print(href)
And here`s the website html:
<a href="http://animeonline.vip/phi-brain-kami-puzzle-3-episode-25" title="Phi Brain: Kami no Puzzle 3 episode 25">
Phi Brain: Kami no Puzzle 3 episode 25 <span> 26-03-2014</span>
</a>
And this is the output:
C:\Python34\python.exe C:/Users/M.Murad/PycharmProjects/untitled/Webcrawler.py
None
Process finished with exit code 0
All that I want is all links and titles in that class (episodes and their links)
Thanks.
So what is happening is, your link element has all the information in anchor <div>
and class = "last_episode" but this has a lot of anchors in it which holds the link in "href" and title in "title".
Just modify the code a little and you will have what you want.
import requests
from bs4 import BeautifulSoup
r = requests.get('http://animeonline.vip/info/phi-brain-kami-puzzle-3')
soup = BeautifulSoup(r.content, "html.parser")
for link in soup.find_all('div', class_='list_episode'):
href_and_title = [(a.get("href"), a.get("title")) for a in link.find_all("a")]
print href_and_title
output will be in form of [(href,title),(href,title),........(href,title)]
Edit(Explanation):
So what is happening is when you do
soup.find_all('div', class_='list_episode')
It gives you all details (in html page) with "div" and class "last_episode" but now this anchor holds a huge set of anchors with different "href" and title details, so to get that we use a for loop (there can be multiple anchors (<a>
)) and ".get()".
href_and_title = [(a.get("href"), a.get("title")) for a in link.find_all("a")]
I hope it's clearer this time .