Search code examples
pythonloopsweb-scrapingbeautifulsoup

Scrape multiple pages with loops in Python


I successfully scraped the first page of the website, but when I tried to scrape mutiples pages, it worked but the result is totally wrong.

Code:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
for num in range(1,15):
    res = requests.get('http://www.abcde.com/Part?Page={num}&s=9&type=%8172653').text
    soup = BeautifulSoup(res,"lxml")
    for item in soup.select(".article-title"):
        print(urljoin('http://www.abcde.com',item['href']))

It only changed one number in every page's url, for example,

http://www.abcde.com/Part?Page=1&s=9&type=%8172653
http://www.abcde.com/Part?Page=2&s=9&type=%8172653

I got total 14 pages of this.

My code worked, but it just repeatedly print out the first page's url for 14 times. The result I expected was to print out all different urls from different pages using loops.


Solution

  • As Jon Clements pointed, format url as below :

    res = requests.get('http://www.abcde.com/Part?Page={}&s=9&type=%8172653'.format(num)).text
    

    You can find more about python format strings at pyformat.info.