Ok this question seems weird but when I scrape worldometer for for covid data, it seems to skip china, , china is between mongolia and cameroon but is not in the dict in cmd. Please anyone can tell me why this is happening.
link to website
import requests
from itertools import islice
from bs4 import BeautifulSoup as bs
url = "https://www.worldometers.info/coronavirus/"
r = requests.get(url)
htmlcontent = r.content
soup = bs(htmlcontent, "html.parser")
country = soup.find_all("a",class_="mt_a")[:120]
names = ["sno",'Country' , 'Totalcases', 'NewCases', 'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered', 'ActiveCases', 'Serious', 'TotCases/1M pop', 'Deaths/1M pop', 'TotalTests', 'Tests/1M pop']
tbody = soup.find_all("tbody")[0]
country_info = [a.string if a.string is not None else "" for i in tbody.find_all("tr")[8:] for a in i.find_all("td")[:14] ]
covid_info = {x: {y:z for y, z in zip(names, country_info[ind*len(names):])} for ind, x in enumerate([i.string for i in country])}
print({ k:v for (k,v) in zip([i.string for i in country],[covid_info[i.string]["Tests/1M pop"] for i in country])})
Edit : I changed the limit to 220 in country slicing and then it prints China at last and rest of them are in the same order. Although my problem is solved but I want to know why China was at last and others are in order.
Edit : I changed the limit to 220 in country slicing and then it prints China at last and rest of them are in the same order. Although my problem is solved but I want to know why China was at last and others are in order.
China is last because, without JavaScript running in the browser, as with requests
, you are getting the result as per the source html, where China is indeed last:
However, in the browser, the instruction to sort on Total Cases Descending is enabled and applied so China moves position:
So, if you want your results in same order make sure to order by that column descending, and re-number the first column if including it.
As to why China is bottom of the source html, you'd need to ask the developers. It might be because originally it was a benchmark/comparator.