Search code examples
pythonhtmlweb-scraping

Scrapping data not coming from exact url


I'm trying to scrap some monster infobox table from rswiki.

Some specific monster have multiple levels, for example:

https://oldschool.runescape.wiki/w/Dwarf

You can switch through the different levels by clicking on boxes on top of the infobox: "Level 7","Level 10"...

Once you click on the level box it changes the url to match the level.

So when i request the url https://oldschool.runescape.wiki/w/Dwarf#Level_10, it's bringing data from the first level only, in case: https://oldschool.runescape.wiki/w/Dwarf#Level_7, and i can't get to scrap other levels.

import requests
from bs4 import BeautifulSoup

url = 'https://oldschool.runescape.wiki/w/Dwarf#Level_20'
response = requests.get(url, headers = {'User-Agent':'Mozilla/5.0'})
soup = BeautifulSoup(response.content, 'html.parser')
soup_minfobox = soup.find_all('table', class_ ="infobox infobox-switch no-parenthesis-style infobox-monster")

print(soup_minfobox[0].text)

Output: Level 7Level 10Level 11Level 20DwarfReleased6 April 2001 (Update)MembersNoCombat level7Size1x1 ...

Excuse me the makeshift code, but in the output you can see that it is the data from the lv 7 in the end, although the url is for the lv 20.


Solution

  • If you manually trigger the events (from the browser's console), you'll see that the infobox changes:

    $("span[data-switch-anchor='#Level_7']").click();
    $("span[data-switch-anchor='#Level_10']").click();
    $("span[data-switch-anchor='#Level_11']").click();
    $("span[data-switch-anchor='#Level_20']").click();
    

    So you can use the above selectors and consult the answers provided in the following topic on how to invoke an event using BeautifulSoup:

    invoking onclick event with beautifulsoup python