Search code examples
pythonpython-3.xbeautifulsouppython-requestssteam

Getting the same response from different HTTP Requests from the Steam Website


this is my first time using StackOverflow actively, so excuse any mistakes. I am currently writing a Python3 script, that is supposed to scrape the Steam Community Marketplace for icons, names and prices. The extraction and formatting of the data works as intended. The Website uses pagination, so I have to make multiple GET requests to cover all 169 pages. My approach was using a for loop and inserting the loop variable in the URL, since i noticed that the current page is included in it.

My problem is, that when I execute the script, and print the arrays that should contain the data, that 90% of the data is exactly the same. (f.e. the content of page 2 is added to the array 7 times)

I am unsure how to fix it, and get the correct data from the request.

I hope this description is clear enough, thanks for any help in advance.

Here is the source code:

import requests
from bs4 import BeautifulSoup
import time
import json as json

def main():

    name_arr = []
    img_arr = []
    price_arr = []



    for i in range(1,11): # later change to 169 pages
        
        url = f"https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Exterior%5B%5D=tag_WearCategory2&category_730_Quality%5B%5D=tag_normal&category_730_Quality%5B%5D=tag_unusual&appid=730#p{i}_popular_desc"
        print(url)
        r = requests.get(url)

        print("----------------------------------- on : " + str(i) + "right now")
        print(r.status_code)

        soup = BeautifulSoup(r.content, "html.parser")

        images = soup.find_all("img", class_="market_listing_item_img")
        names = soup.find_all("span", class_="market_listing_item_name")
        prices = soup.find_all("span", class_="sale_price")



        def extract_text(list, list_arr):
            for x in list:
                name_only = x.text.replace("(Field-Tested)", "").strip()
                list_arr.append(name_only)

        def extract_src(list, list_arr):
            for x in list:
                list_arr.append(x["src"])

        extract_text(names, name_arr)
        extract_text(prices,price_arr)
        extract_src(images, img_arr)

        time.sleep(60)



    print(name_arr)
    print(price_arr)
    print(img_arr)

    with open('output.json', 'w') as f:
    # Write the array to file as JSON
        json.dump(name_arr, f)


    # amount = float(dollars.replace("$", "").strip()) 

if __name__ == "__main__":
    main()

here is the terminal output, notice how the names are in there multiple times:

❯ python3 webscrape.py

['P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'Sawed-Off | Highwayman', 'Galil AR | Shattered', 'AUG | Torque', 'SG 553 | Tornado', 'Dual Berettas | Briar', 'SG 553 | Wave Spray', 'Five-SeveN | Kami', 'FAMAS | Contrast Spray', 'MAG-7 | Chainmail', 'Sawed-Off | Serenity', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'Sawed-Off | Highwayman', 'Galil AR | Shattered', 'AUG | Torque', 'SG 553 | Tornado', 'Dual Berettas | Briar', 'SG 553 | Wave Spray', 'Five-SeveN | Kami', 'FAMAS | Contrast Spray', 'MAG-7 | Chainmail', 'Sawed-Off | Serenity', 'Sawed-Off | Highwayman', 'Galil AR | Shattered', 'AUG | Torque', 'SG 553 | Tornado', 'Dual Berettas | Briar', 'SG 553 | Wave Spray', 'Five-SeveN | Kami', 'FAMAS | Contrast Spray', 'MAG-7 | Chainmail', 'Sawed-Off | Serenity', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange']

Solution

  • The data you see on the page is loaded with help of JavaScript from other URL. You can simulate this with requests module:

    from time import sleep
    import requests
    from bs4 import BeautifulSoup
    
    api_url = 'https://steamcommunity.com/market/search/render/'
    
    params = {
        "query": "",
        "start": 0,
        "count": 10,
        "search_descriptions": "0",
        "sort_column": "popular",
        "sort_dir": "desc",
        "appid": "730",
        "category_730_ItemSet[]": "any",
        "category_730_ProPlayer[]": "any",
        "category_730_StickerCapsule[]": "any",
        "category_730_TournamentTeam[]": "any",
        "category_730_Weapon[]": "any",
        "category_730_Exterior[]": "tag_WearCategory2",
        "category_730_Quality[]": ["tag_normal", "tag_unusual"],
    }
    
    
    with requests.session() as s:
        s.get('https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Exterior%5B%5D=tag_WearCategory2&category_730_Quality%5B%5D=tag_normal&category_730_Quality%5B%5D=tag_unusual&appid=730')
    
        for params['start'] in range(0, 100, 10):  # <-- increase number of pages here
            data = s.get(api_url, params=params).json()
            soup = BeautifulSoup(data['results_html'], 'html.parser')
    
            for item in soup.select('.market_listing_row_link'):
                name = item.select_one('.market_listing_item_name').text.strip()
                qty = item.select_one('.market_listing_num_listings_qty').text.strip()
                price = item.select_one('[data-price]').text.strip()
                print('{:<50} {:<5} {}'.format(name, qty, price))
    
            sleep(10)
    

    Prints:

    Sawed-Off | Highwayman (Field-Tested)              132   $0.92 USD
    Galil AR | Shattered (Field-Tested)                98    $5.81 USD
    AUG | Torque (Field-Tested)                        120   $7.91 USD
    SG 553 | Tornado (Field-Tested)                    91    $8.19 USD
    Dual Berettas | Briar (Field-Tested)               103   $2.01 USD
    SG 553 | Wave Spray (Field-Tested)                 101   $5.56 USD
    Five-SeveN | Kami (Field-Tested)                   136   $1.47 USD
    FAMAS | Contrast Spray (Field-Tested)              158   $2.10 USD
    MAG-7 | Chainmail (Field-Tested)                   18    $16.70 USD
    Sawed-Off | Serenity (Field-Tested)                67    $1.49 USD
    P250 | Whiteout (Field-Tested)                     46    $18.72 USD
    MP7 | Olive Plaid (Field-Tested)                   95    $1.38 USD
    CZ75-Auto | Army Sheen (Field-Tested)              82    $0.98 USD
    G3SG1 | Arctic Camo (Field-Tested)                 32    $4.31 USD
    M4A4 | Asiimov (Field-Tested)                      57    $237.28 USD
    P90 | Fallout Warning (Field-Tested)               66    $7.03 USD
    Tec-9 | Remote Control (Field-Tested)              65    $3.64 USD
    SSG 08 | Tropical Storm (Field-Tested)             89    $6.00 USD
    USP-S | Target Acquired (Field-Tested)             19    $210.02 USD
    M4A4 | Radiation Hazard (Field-Tested)             111   $26.00 USD
    SSG 08 | Lichen Dashed (Field-Tested)              136   $1.39 USD
    M4A1-S | Dark Water (Field-Tested)                 127   $79.48 USD
    Nova | Walnut (Field-Tested)                       126   $1.21 USD
    M4A4 | Zirka (Field-Tested)                        146   $33.01 USD
    P250 | Vino Primo (Field-Tested)                   99    $4.98 USD
    MP7 | Skulls (Field-Tested)                        130   $16.23 USD
    M249 | Shipping Forecast (Field-Tested)            42    $15.48 USD
    Five-SeveN | Nightshade (Field-Tested)             95    $1.27 USD
    G3SG1 | Safari Mesh (Field-Tested)                 111   $1.17 USD
    Negev | CaliCamo (Field-Tested)                    42    $5.84 USD
    AWP | Hyper Beast (Field-Tested)                   142   $42.55 USD
    UMP-45 | Crime Scene (Field-Tested)                20    $67.32 USD
    ★ Moto Gloves | 3rd Commando Company (Field-Tested) 46    $117.73 USD
    Desert Eagle | Code Red (Field-Tested)             122   $34.54 USD
    Tec-9 | Tornado (Field-Tested)                     79    $1.27 USD
    Sawed-Off | Highwayman (Field-Tested)              132   $0.92 USD
    P90 | Baroque Red (Field-Tested)                   15    $29.97 USD
    UMP-45 | Caramel (Field-Tested)                    125   $8.41 USD
    G3SG1 | Murky (Field-Tested)                       90    $0.49 USD
    P2000 | Woodsman (Field-Tested)                    27    $7.18 USD
    
    ...and so on.