I am trying to read a text file present in this url into a pandas dataframe. https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/TU_Stundenwerte_Beschreibung_Stationen.txt
It has uneven spacing between columns. I have tried sep='\s+', delim_whitespace=True but none of these are working. Please suggest a way to read this text file into pandas dataframe.
The read_fwf function in pandas can read a file with a table of fixed-width formatted lines into a DataFrame.
The header line confuses the auto-width calculations so best to skip the header lines and explicitly add the column names so in this case the argument skiprows=2
is added.
import pandas as pd
url ='https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/TU_Stundenwerte_Beschreibung_Stationen.txt'
df = pd.read_fwf(url, encoding="ansi", skiprows=2,
names=['Stations_id', 'von_datum', 'bis_datum', 'Stationshoehe',
'geoBreite', 'geoLaenge', 'Stationsname', 'Bundesland'])
print(df)
Output:
Stations_id von_datum bis_datum Stationshoehe geoBreite geoLaenge Stationsname Bundesland
0 3 19500401 20110331 202 50.7827 6.0941 Aachen Nordrhein-Westfalen
1 44 20070401 20220920 44 52.9336 8.2370 Großenkneten Niedersachsen
2 52 19760101 19880101 46 53.6623 10.1990 Ahrensburg-Wulfsdorf Schleswig-Holstein
3 71 20091201 20191231 759 48.2156 8.9784 Albstadt-Badkap Baden-Württemberg
4 73 20070401 20220920 340 48.6159 13.0506 Aldersbach-Kriestorf Bayern
.. ... ... ... ... ... ... ... ...
663 19171 20200901 20220920 13 54.0038 9.8553 Hasenkrug-Hardebek Schleswig-Holstein
664 19172 20200901 20220920 48 54.0246 9.3880 Wacken Schleswig-Holstein
[665 rows x 8 columns]
If want to load the file locally and open it then just change the url to the local file name.
df = pd.read_fwf('TU_Stundenwerte_Beschreibung_Stationen.txt', encoding="ansi", skiprows=2,
names=['Stations_id', 'von_datum', 'bis_datum', 'Stationshoehe',
'geoBreite', 'geoLaenge', 'Stationsname', 'Bundesland'])