Search code examples
pythonbeautifulsoupscreen-scrapingjupyter-notebook

Python scraping website links to a list


I'm trying to scrape http://www.betexplorer.com/soccer/england/premier-league-2016-2017/results/ website links and then add the links to the empty list.

Here is my code:

from bs4 import BeautifulSoup
import requests

l = []

r = requests.get("http://www.betexplorer.com/soccer/england/premier-league-2016-2017/results/")
c=r.content
soup=BeautifulSoup(c,"html.parser")
for link in soup.find_all("a",{"class":"in-match"}):
    href=link.get('href')
    l.append(href)
    print(l[0])

And now my result is when I'm trying to print the first link of the website:

/soccer/england/premier-league-2016-2017/arsenal-everton/SGPa5fvr/
/soccer/england/premier-league-2016-2017/arsenal-everton/SGPa5fvr/
/soccer/england/premier-league-2016-2017/arsenal-everton/SGPa5fvr/
/soccer/england/premier-league-2016-2017/arsenal-everton/SGPa5fvr/
.................

The problem is that when I try to print out the specific link of the website, the link is printing out many times and it should come out only one time.


Solution

  • You have made a simple logical error. Your print statement currently is inside the loop. Taking it out of the loop scope will fix your issue.

    Fixed version:

    for link in soup.find_all("a",{"class":"in-match"}): 
        href=link.get('href')
        l.append(href)              
    print(l[0])
    

    After loop execute, l array will be filled with links