Search code examples
pythonhtmlweb-scrapingbeautifulsoupmissing-data

Missing data using Beautiful soup


I'm trying to get the university names, scores and country names from this website: https://roundranking.com/ranking/world-university-rankings.html#world-2021 I can find the table where the data is by class, but the data which is in the <tbody> part of table is just disappears when I try to find it with Beautiful soup.

Here is the original html code:

<table class="big-table table-sortable uci" style="padding: 0px;">
<thead class="tableFloatingHeaderOriginal">
<tr><th class="td1">Rank</th><th class="td2" style="background-color: rgb(198, 235, 178);">University</th><th class="td3">Score</th><th class="td4">Country</th><th class="td6">Flag</th><th class="td7">League</th></tr>
</thead><thead class="tableFloatingHeader" style="display: none; opacity: 0;">
<tr><th class="td1">Rank</th><th class="td2" style="background-color: rgb(198, 235, 178);">University</th><th class="td3">Score</th><th class="td4">Country</th><th class="td6">Flag</th><th class="td7">League</th></tr>
</thead>
<tbody>
<tr class="az-row-100"><td class="td1">1</td><td class="td2"><a href="/universities/harvard-university.html?sort=O&amp;year=2021&amp;subject=SO">Harvard University</a></td><td class="td3">100.000</td><td class="td4">USA</td><td class="td6"><img src="../images_rur/Flag/Flag_USA.png" alt=""></td><td class="td7">Diamond League</td>
...
</tbody>
</table>

And here is the html what the soup shows:

<table class="big-table table-sortable uci" style="padding: 0px;">
<thead class="tableFloatingHeaderOriginal">
<tr><th class="td1">Rank</th><th class="td2" style="background-color: rgb(198, 235, 178);">University</th><th class="td3">Score</th><th class="td4">Country</th><th class="td6">Flag</th><th class="td7">League</th></tr>
</thead><thead class="tableFloatingHeader" style="display: none; opacity: 0;">
<tr><th class="td1">Rank</th><th class="td2" style="background-color: rgb(198, 235, 178);">University</th><th class="td3">Score</th><th class="td4">Country</th><th class="td6">Flag</th><th class="td7">League</th></tr>
</thead>
</table>

My python code trying to get tha data:

import selenium
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome('./chromedriver.exe')
driver.get('https://roundranking.com/ranking/world-university-rankings.html#world-2021')

source = driver.page_source
soup=BeautifulSoup(source)
#soup = BeautifulSoup(source, 'html5lib')
#soup = BeautifulSoup(source, 'html.parser')
#soup = BeautifulSoup(source, 'lxml')

soup.prettify

table=soup.find('table', {'class':'big-table table-sortable uci'})
print(table)

I've tried html5lib, lxml and html.parser but nothing works, when I print out the table it does not contain the body part, which has the data I need.


Solution

  • the table is generated by a java script, you can find the required query in the browser. here is an example

    url = "https://roundranking.com/final/ranking-json18r.php"
    
    payload = "t=2021&s=O&sa=SO&sc=All+Countries"
    response = requests.request("POST", url, data=payload)
    for university in response.json():
        print(university['rank'], university['univ'], university['score'], university['economy'], university['league'])
    

    OUTPUT:

    1 Harvard University 100.0 USA Diamond League
    2 California Institute of Technology (Caltech) 98.137 USA Diamond League
    3 Imperial College London 97.706 UK Diamond League
    4 Stanford University 97.604 USA Diamond League
    5 Yale University 97.506 USA Diamond League
    6 Massachusetts Institute of Technology (MIT) 97.364 USA Diamond League
    7 ETH Zurich (Swiss Federal Institute of Technology) 96.187 Switzerland Diamond League
    8 Columbia University 95.393 USA Diamond League
    9 University of Cambridge 95.258 UK Diamond League
    10 University of Oxford 94.989 UK Diamond League
    11 University of Chicago 94.712 USA Diamond League
    12 Karolinska Institute 94.642 Sweden Diamond League
    13 Johns Hopkins University 94.299 USA Diamond League
    14 University College London 94.172 UK Diamond League
    15 Northwestern University 94.117 USA Diamond League
    16 Princeton University 93.993 USA Diamond League
    17 Ecole Polytechnique Federale de Lausanne 93.75 Switzerland Diamond League
    18 University of Pennsylvania 93.525 USA Diamond League
    19 Cornell University 92.271 USA Diamond League
    20 Washington University in St. Louis 91.325 USA Diamond League
    21 Carnegie Mellon University 90.608 USA Diamond League
    22 Scuola Normale Superiore di Pisa 90.345 Italy Diamond League
    23 Case Western Reserve University 90.314 USA Diamond League
    24 University of Michigan 89.447 USA Diamond League
    25 Boston University 89.443 USA Diamond League
    26 Brown University 89.043 USA Diamond League
    27 Technical University of Denmark 88.842 Denmark Diamond League
    ...