this is my first time using StackOverflow actively, so excuse any mistakes. I am currently writing a Python3 script, that is supposed to scrape the Steam Community Marketplace for icons, names and prices. The extraction and formatting of the data works as intended. The Website uses pagination, so I have to make multiple GET requests to cover all 169 pages. My approach was using a for loop and inserting the loop variable in the URL, since i noticed that the current page is included in it.
My problem is, that when I execute the script, and print the arrays that should contain the data, that 90% of the data is exactly the same. (f.e. the content of page 2 is added to the array 7 times)
I am unsure how to fix it, and get the correct data from the request.
I hope this description is clear enough, thanks for any help in advance.
Here is the source code:
import requests
from bs4 import BeautifulSoup
import time
import json as json
def main():
name_arr = []
img_arr = []
price_arr = []
for i in range(1,11): # later change to 169 pages
url = f"https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Exterior%5B%5D=tag_WearCategory2&category_730_Quality%5B%5D=tag_normal&category_730_Quality%5B%5D=tag_unusual&appid=730#p{i}_popular_desc"
print(url)
r = requests.get(url)
print("----------------------------------- on : " + str(i) + "right now")
print(r.status_code)
soup = BeautifulSoup(r.content, "html.parser")
images = soup.find_all("img", class_="market_listing_item_img")
names = soup.find_all("span", class_="market_listing_item_name")
prices = soup.find_all("span", class_="sale_price")
def extract_text(list, list_arr):
for x in list:
name_only = x.text.replace("(Field-Tested)", "").strip()
list_arr.append(name_only)
def extract_src(list, list_arr):
for x in list:
list_arr.append(x["src"])
extract_text(names, name_arr)
extract_text(prices,price_arr)
extract_src(images, img_arr)
time.sleep(60)
print(name_arr)
print(price_arr)
print(img_arr)
with open('output.json', 'w') as f:
# Write the array to file as JSON
json.dump(name_arr, f)
# amount = float(dollars.replace("$", "").strip())
if __name__ == "__main__":
main()
here is the terminal output, notice how the names are in there multiple times:
❯ python3 webscrape.py
['P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'Sawed-Off | Highwayman', 'Galil AR | Shattered', 'AUG | Torque', 'SG 553 | Tornado', 'Dual Berettas | Briar', 'SG 553 | Wave Spray', 'Five-SeveN | Kami', 'FAMAS | Contrast Spray', 'MAG-7 | Chainmail', 'Sawed-Off | Serenity', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange', 'Sawed-Off | Highwayman', 'Galil AR | Shattered', 'AUG | Torque', 'SG 553 | Tornado', 'Dual Berettas | Briar', 'SG 553 | Wave Spray', 'Five-SeveN | Kami', 'FAMAS | Contrast Spray', 'MAG-7 | Chainmail', 'Sawed-Off | Serenity', 'Sawed-Off | Highwayman', 'Galil AR | Shattered', 'AUG | Torque', 'SG 553 | Tornado', 'Dual Berettas | Briar', 'SG 553 | Wave Spray', 'Five-SeveN | Kami', 'FAMAS | Contrast Spray', 'MAG-7 | Chainmail', 'Sawed-Off | Serenity', 'P90 | Blind Spot', 'SCAR-20 | Cardiac', 'Five-SeveN | Contractor', 'PP-Bizon | Forest Leaves', 'XM1014 | Urban Perforated', 'Sawed-Off | Irradiated Alert', 'SG 553 | Tornado', 'P250 | Mehndi', 'FAMAS | Commemoration', 'XM1014 | Blaze Orange']
The data you see on the page is loaded with help of JavaScript from other URL. You can simulate this with requests
module:
from time import sleep
import requests
from bs4 import BeautifulSoup
api_url = 'https://steamcommunity.com/market/search/render/'
params = {
"query": "",
"start": 0,
"count": 10,
"search_descriptions": "0",
"sort_column": "popular",
"sort_dir": "desc",
"appid": "730",
"category_730_ItemSet[]": "any",
"category_730_ProPlayer[]": "any",
"category_730_StickerCapsule[]": "any",
"category_730_TournamentTeam[]": "any",
"category_730_Weapon[]": "any",
"category_730_Exterior[]": "tag_WearCategory2",
"category_730_Quality[]": ["tag_normal", "tag_unusual"],
}
with requests.session() as s:
s.get('https://steamcommunity.com/market/search?q=&category_730_ItemSet%5B%5D=any&category_730_ProPlayer%5B%5D=any&category_730_StickerCapsule%5B%5D=any&category_730_TournamentTeam%5B%5D=any&category_730_Weapon%5B%5D=any&category_730_Exterior%5B%5D=tag_WearCategory2&category_730_Quality%5B%5D=tag_normal&category_730_Quality%5B%5D=tag_unusual&appid=730')
for params['start'] in range(0, 100, 10): # <-- increase number of pages here
data = s.get(api_url, params=params).json()
soup = BeautifulSoup(data['results_html'], 'html.parser')
for item in soup.select('.market_listing_row_link'):
name = item.select_one('.market_listing_item_name').text.strip()
qty = item.select_one('.market_listing_num_listings_qty').text.strip()
price = item.select_one('[data-price]').text.strip()
print('{:<50} {:<5} {}'.format(name, qty, price))
sleep(10)
Prints:
Sawed-Off | Highwayman (Field-Tested) 132 $0.92 USD
Galil AR | Shattered (Field-Tested) 98 $5.81 USD
AUG | Torque (Field-Tested) 120 $7.91 USD
SG 553 | Tornado (Field-Tested) 91 $8.19 USD
Dual Berettas | Briar (Field-Tested) 103 $2.01 USD
SG 553 | Wave Spray (Field-Tested) 101 $5.56 USD
Five-SeveN | Kami (Field-Tested) 136 $1.47 USD
FAMAS | Contrast Spray (Field-Tested) 158 $2.10 USD
MAG-7 | Chainmail (Field-Tested) 18 $16.70 USD
Sawed-Off | Serenity (Field-Tested) 67 $1.49 USD
P250 | Whiteout (Field-Tested) 46 $18.72 USD
MP7 | Olive Plaid (Field-Tested) 95 $1.38 USD
CZ75-Auto | Army Sheen (Field-Tested) 82 $0.98 USD
G3SG1 | Arctic Camo (Field-Tested) 32 $4.31 USD
M4A4 | Asiimov (Field-Tested) 57 $237.28 USD
P90 | Fallout Warning (Field-Tested) 66 $7.03 USD
Tec-9 | Remote Control (Field-Tested) 65 $3.64 USD
SSG 08 | Tropical Storm (Field-Tested) 89 $6.00 USD
USP-S | Target Acquired (Field-Tested) 19 $210.02 USD
M4A4 | Radiation Hazard (Field-Tested) 111 $26.00 USD
SSG 08 | Lichen Dashed (Field-Tested) 136 $1.39 USD
M4A1-S | Dark Water (Field-Tested) 127 $79.48 USD
Nova | Walnut (Field-Tested) 126 $1.21 USD
M4A4 | Zirka (Field-Tested) 146 $33.01 USD
P250 | Vino Primo (Field-Tested) 99 $4.98 USD
MP7 | Skulls (Field-Tested) 130 $16.23 USD
M249 | Shipping Forecast (Field-Tested) 42 $15.48 USD
Five-SeveN | Nightshade (Field-Tested) 95 $1.27 USD
G3SG1 | Safari Mesh (Field-Tested) 111 $1.17 USD
Negev | CaliCamo (Field-Tested) 42 $5.84 USD
AWP | Hyper Beast (Field-Tested) 142 $42.55 USD
UMP-45 | Crime Scene (Field-Tested) 20 $67.32 USD
★ Moto Gloves | 3rd Commando Company (Field-Tested) 46 $117.73 USD
Desert Eagle | Code Red (Field-Tested) 122 $34.54 USD
Tec-9 | Tornado (Field-Tested) 79 $1.27 USD
Sawed-Off | Highwayman (Field-Tested) 132 $0.92 USD
P90 | Baroque Red (Field-Tested) 15 $29.97 USD
UMP-45 | Caramel (Field-Tested) 125 $8.41 USD
G3SG1 | Murky (Field-Tested) 90 $0.49 USD
P2000 | Woodsman (Field-Tested) 27 $7.18 USD
...and so on.