I need to do a DataFrame in Python with the information of Top 500 Americas Companies:
I tried to do web scraping and when I print(tabla) it said [] or None...
from bs4 import BeautifulSoup
import requests
url = 'https://www.americaeconomia.com/negocios-industrias/estas-son-las-500-mayores-empresas-de-america-latina-2021'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
tabla = soup.find('table', {"id":"awesomeTable"})
print(tabla)
Always look in your soup first - therein lies the truth. The content can always be slightly to extremely different from the view in the development tools.
You won't find the table in your soup, cause it is in iframe.
Use the url of the iframe source to perform your request:
https://rk.americaeconomia.com/display/embed/500-latam/2021
import requests
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
r = requests.get('https://rk.americaeconomia.com/display/embed/500-latam/2021',headers=headers)
soup = BeautifulSoup(r.text,'lxml')
data = []
for row in soup.select('#awesomeTable tbody tr.dataRow'):
data.append(list(row.stripped_strings))
pd.DataFrame(data, columns=list(soup.select_one('#awesomeTable tr').stripped_strings))
RK 2021 | EMPRESA | PAÍS |
---|---|---|
1 | PETROBRAS | BRA |
2 | JBS | BRA |
3 | AMÉRICA MÓVIL | MX |
4 | PEMEX | MX |
5 | VALE | BRA |
... | ... | ... |