Search code examples
pythonpandasdataframepivotunpivot

how to create new columns in pandas based on a column's values within a dataframe (pivot or unpivot ? )


I have the following weather data from NOAA weathers stations.

        stn wban    date    temp    count_temp  dewp    count_dewp  slp count_slp   stp count_stp   visib
486990  99999   1/1/2020    82.6    24  73.9    24  9999.9  0   999.9   0   6.2
486980  99999   1/1/2020    82.4    24  74.4    24  9999.9  0   999.9   0   6.3
486990  99999   2/1/2020    82  24  74.7    24  9999.9  0   999.9   0   6.1
486980  99999   2/1/2020    82.2    24  75.1    24  9999.9  0   999.9   0   6.1
486990  99999   3/1/2020    82.1    24  75.2    24  9999.9  0   999.9   0   6.2
... ... ... ... ... ... ... ... ... ... ... ...
486990  99999   29/12/2020  79.3    24  73.5    24  9999.9  0   999.9   0   5.9
486980  99999   30/12/2020  81.8    24  74.6    24  9999.9  0   999.9   0   6.1
486990  99999   30/12/2020  81.8    24  72.7    24  9999.9  0   999.9   0   6.2
486980  99999   31/12/2020  81.1    24  75  24  9999.9  0   999.9   0   6.2
486990  99999   31/12/2020  80.4    24  72.9    24  9999.9  0   999.9   0   6.1


I want the dataframe to be separated by the column 'wban'

I want it to look like that,



date_486990 temp_486990 count_temp_486990   dewp_486990 count_dewp_486990   slp_486990  count_slp_486990    stp_486990  count_stp_486990    visib_486990    date_486980 temp_486980 count_temp_486980   dewp_486980 count_dewp_486980   slp_486980  count_slp_486980    stp_486980  count_stp_486980    visib_486980
1/1/2020    82.6    24  73.9    24  9999.9  0   999.9   0   6.2 1/1/2020    82.4    24  74.4    24  9999.9  0   999.9   0   6.3
2/1/2020    82  24  74.7    24  9999.9  0   999.9   0   6.1 2/1/2020    82.2    24  75.1    24  9999.9  0   999.9   0   6.1
..............................
30/12/2020  81.8    24  72.7    24  9999.9  0   999.9   0   6.2 30/12/2020  81.8    24  74.6    24  9999.9  0   999.9   0   6.1
31/12/2020  80.4    24  72.9    24  9999.9  0   999.9   0   6.1 31/12/2020  81.1    24  75  24  9999.9  0   999.9   0   6.2



Here is a picture for easier reference enter image description here

I know it a bit hard to see, but basically I want each unique value in the wban column to have its own set of columns on the right, but to share the same date with the rest of the data.

How do i do that, do I have to unpivot or melt or crosstab or something else ? I am not sure what function to use or where to even start with this problem.

Any Ideas ?

I know the long way to do it,

First I filter the dataframe by stn.

Then I run a for loop and merge each frame on the date.

frame = df[df.stn.unique()]

for i in df.stn.unique():
    temp = df[df.stn==i]
    frame.merge(temp, how='left', on='date')

Is there a shorter more efficient pandas function to do this?


Solution

  • Something is ambiguous, you describe wban the value to use as pivot, but in the provided dataset, the column with "486990" "486980" is stn. For the rest, I'll consider that you want to pivot by stn. Please correct you question if this is not the case.

    You can use pandas.DataFrame.pivot, specify all columns but stn and wban as values. Then rework the MultiIndex as a single index by concatenating the names. Finally, use dropna to get rid of the rows with incomplete data (optional)

    df2 = df.pivot(index='date',
                   columns=['stn'],
                   values=df.columns.drop(['stn', 'wban'])
                  )
    df2.columns = ['%s_%s' % cols for cols in df2.columns]
    df2.dropna()
    

    output:

               date_486980 date_486990 temp_486980 temp_486990 count_temp_486980 count_temp_486990 dewp_486980 dewp_486990 count_dewp_486980 count_dewp_486990 slp_486980 slp_486990 count_slp_486980 count_slp_486990 stp_486980 stp_486990 count_stp_486980 count_stp_486990 visib_486980 visib_486990
    date                                                                                                                                                                                                                                                                                                
    1/1/2020      1/1/2020    1/1/2020        82.4        82.6                24                24        74.4        73.9                24                24     9999.9     9999.9                0                0      999.9      999.9                0                0          6.3          6.2
    2/1/2020      2/1/2020    2/1/2020        82.2          82                24                24        75.1        74.7                24                24     9999.9     9999.9                0                0      999.9      999.9                0                0          6.1          6.1
    30/12/2020  30/12/2020  30/12/2020        81.8        81.8                24                24        74.6        72.7                24                24     9999.9     9999.9                0                0      999.9      999.9                0                0          6.1          6.2
    31/12/2020  31/12/2020  31/12/2020        81.1        80.4                24                24          75        72.9                24                24     9999.9     9999.9                0                0      999.9      999.9                0                0          6.2          6.1