Search code examples
pythonbeautifulsoupexport-to-csv

Why is my code looping through only the first webpage using BeautifulSoup?


I am just messing around with BeautifulSoup and testing it on different websites after recently learning about it. I am currently trying to iterate through multiple pages instead of just the first page. I can append or write the information I am grabbing from any specific page that I desire but of course I would love to automate it.

This is my current code when trying to get it to work up to page five. Currently it only goes through the first webpage and writes the same info I am looking for to my excel file, five times. In my nested for loop I have some print statements just to see if it is working on the console before I even look in the file.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import unicodecsv as csv

f = open("on_sale_games.csv", "w", encoding='utf-8')
headers = "Game name, Original price, Final price, Percent off\n"
f.write(headers)

for i in range(5):
    my_url = 'https://store.steampowered.com/specials#p={}&tab=TopSellers'.format(i+1)

    uClient = uReq(my_url)  # open up the url and download the page.
    page_html = uClient.read()  # reading the html page and storing the info into page_html.
    uClient.close()  # closing the page.

    page_soup = soup(page_html, 'html.parser')  # html parsing

    containers = page_soup.findAll("a", {"class": "tab_item"})

    for container in containers:
        name_stuff = container.findAll("div", {"class": "tab_item_name"})
        name = name_stuff[0].text
        print("Game name:", name)

        original_price = container.findAll("div", {"class": "discount_original_price"})
        original = original_price[0].text
        print("Original price:", original)

        discounted_price = container.findAll("div", {"class": "discount_final_price"})
        final = discounted_price[0].text
        print("Discounted price:", final)

        discount_pct = container.findAll("div", {"class": "discount_pct"})
        pct = discount_pct[0].text
        print("Percent off:", pct)

        f.write(name.replace(':', '').replace("™", " ") + ',' + original + ',' + final + ',' + pct + '\n')

f.close()

Solution

  • Checking through the requests made by the browser, I noticed there's a request made in the background to fetch the data and get json result, you could work your way from there:

    for i in range(5):
        my_url = 'https://store.steampowered.com/contenthub/querypaginated/specials/NewReleases/render/?query=&start={}'.format(i*15)
        uClient = uReq(my_url)
        page_html = uClient.read()
        uClient.close()
        data = json.loads(page_html)["results_html"]
        page_soup = soup(data, 'html.parser')
        # Rest of the code
    

    It's like an API that get's 15 elements per page, so it's starts at 0, 15, 30 and so on.