I have the following weather data from NOAA weathers stations.
stn wban date temp count_temp dewp count_dewp slp count_slp stp count_stp visib
486990 99999 1/1/2020 82.6 24 73.9 24 9999.9 0 999.9 0 6.2
486980 99999 1/1/2020 82.4 24 74.4 24 9999.9 0 999.9 0 6.3
486990 99999 2/1/2020 82 24 74.7 24 9999.9 0 999.9 0 6.1
486980 99999 2/1/2020 82.2 24 75.1 24 9999.9 0 999.9 0 6.1
486990 99999 3/1/2020 82.1 24 75.2 24 9999.9 0 999.9 0 6.2
... ... ... ... ... ... ... ... ... ... ... ...
486990 99999 29/12/2020 79.3 24 73.5 24 9999.9 0 999.9 0 5.9
486980 99999 30/12/2020 81.8 24 74.6 24 9999.9 0 999.9 0 6.1
486990 99999 30/12/2020 81.8 24 72.7 24 9999.9 0 999.9 0 6.2
486980 99999 31/12/2020 81.1 24 75 24 9999.9 0 999.9 0 6.2
486990 99999 31/12/2020 80.4 24 72.9 24 9999.9 0 999.9 0 6.1
I want the dataframe to be separated by the column 'wban'
I want it to look like that,
date_486990 temp_486990 count_temp_486990 dewp_486990 count_dewp_486990 slp_486990 count_slp_486990 stp_486990 count_stp_486990 visib_486990 date_486980 temp_486980 count_temp_486980 dewp_486980 count_dewp_486980 slp_486980 count_slp_486980 stp_486980 count_stp_486980 visib_486980
1/1/2020 82.6 24 73.9 24 9999.9 0 999.9 0 6.2 1/1/2020 82.4 24 74.4 24 9999.9 0 999.9 0 6.3
2/1/2020 82 24 74.7 24 9999.9 0 999.9 0 6.1 2/1/2020 82.2 24 75.1 24 9999.9 0 999.9 0 6.1
..............................
30/12/2020 81.8 24 72.7 24 9999.9 0 999.9 0 6.2 30/12/2020 81.8 24 74.6 24 9999.9 0 999.9 0 6.1
31/12/2020 80.4 24 72.9 24 9999.9 0 999.9 0 6.1 31/12/2020 81.1 24 75 24 9999.9 0 999.9 0 6.2
Here is a picture for easier reference
I know it a bit hard to see, but basically I want each unique value in the wban column to have its own set of columns on the right, but to share the same date with the rest of the data.
How do i do that, do I have to unpivot or melt or crosstab or something else ? I am not sure what function to use or where to even start with this problem.
Any Ideas ?
I know the long way to do it,
First I filter the dataframe by stn
.
Then I run a for loop and merge each frame on the date.
frame = df[df.stn.unique()]
for i in df.stn.unique():
temp = df[df.stn==i]
frame.merge(temp, how='left', on='date')
Is there a shorter more efficient pandas function to do this?
Something is ambiguous, you describe wban
the value to use as pivot, but in the provided dataset, the column with "486990" "486980" is stn
. For the rest, I'll consider that you want to pivot by stn
. Please correct you question if this is not the case.
You can use pandas.DataFrame.pivot
, specify all columns but stn
and wban
as values. Then rework the MultiIndex
as a single index by concatenating the names. Finally, use dropna
to get rid of the rows with incomplete data (optional)
df2 = df.pivot(index='date',
columns=['stn'],
values=df.columns.drop(['stn', 'wban'])
)
df2.columns = ['%s_%s' % cols for cols in df2.columns]
df2.dropna()
output:
date_486980 date_486990 temp_486980 temp_486990 count_temp_486980 count_temp_486990 dewp_486980 dewp_486990 count_dewp_486980 count_dewp_486990 slp_486980 slp_486990 count_slp_486980 count_slp_486990 stp_486980 stp_486990 count_stp_486980 count_stp_486990 visib_486980 visib_486990
date
1/1/2020 1/1/2020 1/1/2020 82.4 82.6 24 24 74.4 73.9 24 24 9999.9 9999.9 0 0 999.9 999.9 0 0 6.3 6.2
2/1/2020 2/1/2020 2/1/2020 82.2 82 24 24 75.1 74.7 24 24 9999.9 9999.9 0 0 999.9 999.9 0 0 6.1 6.1
30/12/2020 30/12/2020 30/12/2020 81.8 81.8 24 24 74.6 72.7 24 24 9999.9 9999.9 0 0 999.9 999.9 0 0 6.1 6.2
31/12/2020 31/12/2020 31/12/2020 81.1 80.4 24 24 75 72.9 24 24 9999.9 9999.9 0 0 999.9 999.9 0 0 6.2 6.1