I edited and reposted based on some advice I got about posting questions
I need some help combining 3 dataframes in a specific way.
So I start with 3 datasets
This is the first:
df_WL = pd.read_csv('Regional_Scale_GWL_data.csv')
df_WL
Date SiteNo WL
0 8/20/1992 6203301 58
1 2/16/1993 6203301 57
2 2/23/1994 6203301 57
3 11/17/1994 6203301 58
4 11/16/1995 6203301 57
... ... ... ...
784760 12/6/2017 334000000000000 258.22
784761 12/6/2017 334000000000000 258.22
784762 3/15/2018 334000000000000 258.43
784763 5/30/2018 334000000000000 258.34
784764 NaN 334000000000000 NaN
[784765 rows x 3 columns]
From that I create this dataframe:
df_WL['Date'] = pd.to_datetime(df_WL['Date'], errors='coerce')
df_WL['WL'] = pd.to_numeric(df_WL['WL'], errors='coerce')
df_WL['SiteNo'] = df_WL['SiteNo'].astype(str)
df_WL = df_WL.dropna(subset=['Date'])
df_WL = df_WL.dropna(subset=['WL'])
df_WL = df_WL.pivot_table(index='Date', columns=["SiteNo"], values=['WL']) \
.reorder_levels([1, 0], axis=1) \
.sort_index(axis=1)
df_WL
SiteNo 1021201 1023902 ... SA-0174 SA-0231 SM-0049
WL WL ... WL WL WL
Date ...
1970-01-01 NaN NaN ... NaN NaN NaN
1970-01-03 NaN NaN ... NaN NaN NaN
1970-01-05 NaN NaN ... NaN NaN NaN
1970-01-06 NaN NaN ... NaN NaN NaN
1970-01-07 3692.0 NaN ... NaN NaN NaN
... ... ... ... ... ... ...
2021-02-18 NaN NaN ... NaN NaN NaN
2021-02-19 NaN NaN ... NaN NaN NaN
2021-02-22 NaN NaN ... NaN NaN NaN
2021-02-23 NaN NaN ... NaN NaN NaN
2021-02-24 NaN NaN ... NaN 7209.0 NaN
[17353 rows x 863 columns]
The other two data sets that I have are:
df_precip = pd.read_csv('Regional_Scale_Precip.csv')
df_precip = df_precip.set_index('Date')
df_precip
294957000000000 294722000000000 ... 6129203 6414105
Date
1981-01-01 0.000 0.000 ... 0.000 0.000
1981-01-02 0.000 0.000 ... 0.000 0.000
1981-01-03 0.000 0.000 ... 0.000 0.000
1981-01-04 0.000 0.000 ... 0.000 0.000
2017-05-27 0.000 0.000 ... 0.000 0.000
... ... ... ... ... ...
2017-05-22 13.529 15.883 ... 19.788 45.493
2017-05-23 16.181 28.589 ... 36.448 8.722
2017-05-24 13.189 16.917 ... 15.643 14.794
2017-05-25 0.000 0.000 ... 0.000 0.000
2017-05-26 0.000 0.000 ... 0.000 0.000
[13295 rows x 1331 columns]
and
df_temp = pd.read_csv('Regional_Scale_Temp.csv')
df_temp = df_temp.set_index('Date')
df_temp
6131901 6129203 ... 6414105 6155707
Date
1981-01-01 8.965 8.733 ... 9.117 9.118
1981-01-02 6.654 6.614 ... 7.834 7.195
1981-01-03 4.794 4.796 ... 4.826 4.880
1981-01-04 7.582 7.752 ... 8.380 8.018
2009-08-25 22.438 22.129 ... 23.607 22.702
... ... ... ... ... ...
2009-08-20 27.354 27.177 ... 28.498 28.055
2009-08-21 26.706 26.397 ... 28.671 27.479
2009-08-22 25.126 24.778 ... 26.644 25.600
2009-08-23 22.001 21.835 ... 23.803 22.543
2009-08-24 21.626 21.422 ... 23.257 22.160
[10463 rows x 1331 columns]
my goal is to create a dataframe that looks like this (I made this in excel with arbitrary values to illustrate my goal):
I've attempted to unstack the two last dataframes and combine them with the first one to create a one with 'Date', 'SiteNo', 'WL', 'Temp', and 'Precip' columns. Then pivot them to get what I am aiming for but that was a huge mess.
Any help would be appreciated. Thanks!
'Date'
, 'Site'
, 'Values'
, and 'Type'
column.pandas.concat
to combine all of the DataFramespandas.DataFrame.pivot
to achieve the desired formimport pandas as pd
# load the data
wl = pd.read_csv('wl.csv', parse_dates=['Date'])
pre = pd.read_csv('precip.csv', index_col='Date', parse_dates=['Date'])
temp = pd.read_csv('temp.csv', index_col='Date', parse_dates=['Date'])
# wl is already in a long form so clean the column names
wl.rename({'SiteNo': 'Site', 'WL': 'Values'}, axis=1, inplace=True)
# stack the other two dataframes into a long form
pre = pre.stack().reset_index(name='Values').rename({'level_1': 'Site'}, axis=1)
temp = temp.stack().reset_index(name='Values').rename({'level_1': 'Site'}, axis=1)
# add a Type column
wl['Type'] = 'WL'
pre['Type'] = 'Precip'
temp['Type'] = 'Temp'
# sample of wl
Date Site Values Type
0 1992-08-20 6203301 58.0 WL
1 1993-02-16 6203301 57.0 WL
2 1994-02-23 6203301 57.0 WL
3 1994-11-17 6203301 58.0 WL
4 1995-11-16 6203301 57.0 WL
'Site'
columns don't have all 'Type'
columns in the pivoted DataFrame# combine the DataFrames
df = pd.concat([wl, pre, temp])
# drop duplicate rows - there shouldn't be any, but the sample data did
df.drop_duplicates(inplace=True)
# sort the values - not strictly necessary
df = df.sort_values(['Date', 'Site', 'Type']).reset_index(drop=True)
# dropna
df.dropna(subset=['Date'], inplace=True)
# pivot
dfp = df.pivot(index='Date', columns=['Site', 'Type'], values='Values')
# display(dfp.head())
Site 294722000000000 294957000000000 6129203 6131901 6155707 6414105 6203301 334000000000000
Type Precip Precip Precip Temp Temp Temp Precip Temp WL WL
Date
1981-01-01 0.0 0.0 0.0 8.733 8.965 9.118 0.0 9.117 NaN NaN
1981-01-02 0.0 0.0 0.0 6.614 6.654 7.195 0.0 7.834 NaN NaN
1981-01-03 0.0 0.0 0.0 4.796 4.794 4.880 0.0 4.826 NaN NaN
1981-01-04 0.0 0.0 0.0 7.752 7.582 8.018 0.0 8.380 NaN NaN
1992-08-20 NaN NaN NaN NaN NaN NaN NaN NaN 58.0 NaN