Search code examples
pythonajaxwebbeautifulsoupscreen-scraping

Scraping AJAX e-commerce site using python


I have a problem on scraping an e-commerce site using BeautifulSoup. I did some Googling but I still can't solve the problem.

Please refer on the pictures:

1 Chrome F12 : enter image description here

2 Result : enter image description here

Here is the site that I tried to scrape: "https://shopee.com.my/search?keyword=h370m"

Problem:

  1. When I tried to open up Inspect Element on Google Chrome (F12), I can see the for the product's name, price, etc. But when I run my python program, I could not get the same code and tag in the python result. After some googling, I found out that this website used AJAX query to get the data.

  2. Anyone can help me on the best methods to get these product's data by scraping an AJAX site? I would like to display the data in a table form.

My code:

import requests
from bs4 import BeautifulSoup
source = requests.get('https://shopee.com.my/search?keyword=h370m')
soup = BeautifulSoup(source.text, 'html.parser')
print(soup)

Solution

  • Welcome to StackOverflow! You can inspect where the ajax request is being sent to and replicate that.

    In this case the request goes to this api url. You can then use requests to perform a similar request. Notice however that this api endpoint requires a correct UserAgent header. You can use a package like fake-useragent or just hardcode a string for the agent.

    import requests
    
    # fake useragent
    from fake_useragent import UserAgent
    user_agent = UserAgent().chrome
    
    # or hardcode
    user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1468.0 Safari/537.36'
    
    url = 'https://shopee.com.my/api/v2/search_items/?by=relevancy&keyword=h370m&limit=50&newest=0&order=desc&page_type=search'
    resp = requests.get(url, headers={
        'User-Agent': user_agent
    })
    data = resp.json()
    products = data.get('items')