I'm trying to scrape data from a few pages using pandas with some simple code.
import pandas as pd
import requests
import numpy as np
dfs = []
http = "https://www.milieudatabase.nl/viewNMD/view_materiaal_new.php?numCode="
for r in range(293,296):
url = f'{http}{r:02d}'
r = requests.get(url)
df_list = pd.read_html(r.text) # this parses all the tables in webpages to a list
dfs.append(df_list)
NMD = pd.concat([pd.DataFrame(dfs)])
print(NMD)
NMD.to_csv('NMD50.csv', index=False)
df.head()
When I use the df.head(), the dataframe displays in the way I would like. However, where I try and send this to csv it comes out with all data on 3 rows instead in the format of multiple rows. I think it's an issue with the df_list function,
Can anyone help?
You have couple of errors. First let me show the working code:
dfs = []
http = "https://www.milieudatabase.nl/viewNMD/view_materiaal_new.php?numCode="
for r in range(293,296):
url = f'{http}{r:02d}'
r = requests.get(url)
df_list = pd.read_html(r.text)
dfs.append(df_list[0])
NMD = pd.concat(dfs)