I am trying to create a Dataframe
from web scraping the same webpage with different sections, but when trying to signify the columns, I get this error :
"Length mismatch: Expected axis has 5 elements, new values have 8 elements"
url='https://money.cnn.com/magazines/fortune/fortune500_archive/full/1955/1.html'
webcontent=urlopen(url)
html_page=webcontent.read()
soup=BeautifulSoup(html_page, "lxml")
table=soup.select("table")[0]
rows= table.select('tr')
table_data=[]
for row in rows:
td_tag=row.select('td')
row_values=[value.string for value in td_tag]
table_data.append(row_values)
data=pd.DataFrame(table_data[1:])
cols=[header.string for header in table.select('th')]
data.columns= cols
data.head()
Your help would be really appreciated!
I think maybe page content changed.
there are two issues:
1. in table_data
the first item you want ['1', 'General Motors', '9,823.5', '806.0']
start at 7th
2. and you only need last 4 items in [header.string for header in table.select('th')]
['Rank', 'Company', None, None, 'Rank', 'Company', 'Rank', 'Company']
data=pd.DataFrame(table_data[6:])
cols=[header.string for header in table.select('th')][-4:]
data.columns= cols
data.head()
Rank Company Rank Company
0 1 General Motors 9,823.5 806.0
1 2 Exxon Mobil 5,661.4 584.8
2 3 U.S. Steel 3,250.4 195.4
3 4 General Electric 2,959.1 212.6
4 5 Esmark 2,510.8 19.1