So, I have this data,
data = {
'farm_Id': ['farm_258', 'farm_344', 'farm_345', 'farm_346', 'farm_348'],
'2018-12-16_NDRE': [0.406, 0.380, 0.449, 0.432, 0.407],
'2018-12-16_NDVI': [0.581, 0.552, 0.619, 0.573, 0.549],
'2018-12-21_NDRE': [0.417, 0.270, 0.387, 0.403, 0.377],
'2018-12-21_NDVI': [0.605, 0.416, 0.567, 0.555, 0.532],
'2018-12-26_NDRE': [0.338, 0.191, 0.267, 0.252, 0.219],
'2018-12-26_NDVI': [0.480, 0.285, 0.381, 0.324, 0.308],
'2018-12-31_NDRE': [0.411, 0.202, 0.411, 0.463, 0.345],
'2018-12-31_NDVI': [0.592, 0.294, 0.588, 0.622, 0.508],
'2019-01-05_NDRE': [0.378, 0.207, 0.431, 0.495, 0.321],
'2019-01-05_NDVI': [0.576, 0.293, 0.621, 0.668, 0.432],
}
I'm trying to find the monthly average values for ndre and ndvi.
So far have tried the following code.
import pandas as pd
df = pd.DataFrame(data)
df.set_index('farm_Id', inplace=True)
# Extract the month from column names
df.columns = pd.to_datetime(df.columns, format='%Y-%m-%d').strftime('%Y-%m-%d')
# Group by month and calculate the mean
monthly_average = df.groupby(df.columns, axis=1).mean()
# Print the result
monthly_average
But, there is a format error for date (Error: time data "2018-12-16_NDRE" at position 0 doesn't match format specified). No matter how I try unable to fix the error and achive result. Please suggest the right method.
You could use pd.melt
to get your dataframe in vertical schema, then use str.split
to separate the types (in your case NDVI and NDRE) and the date.
then you filter by type (since you want data related only to NDRE and NDVI) and you do a usual groupby and get the mean value.
Approach1:: here is a sample code using your sample dataset:
import pandas as pd
data = {
'farm_Id': ['farm_258', 'farm_344', 'farm_345', 'farm_346', 'farm_348'],
'2018-12-16_NDRE': [0.406, 0.380, 0.449, 0.432, 0.407],
'2018-12-16_NDVI': [0.581, 0.552, 0.619, 0.573, 0.549],
'2018-12-21_NDRE': [0.417, 0.270, 0.387, 0.403, 0.377],
'2018-12-21_NDVI': [0.605, 0.416, 0.567, 0.555, 0.532],
'2018-12-26_NDRE': [0.338, 0.191, 0.267, 0.252, 0.219],
'2018-12-26_NDVI': [0.480, 0.285, 0.381, 0.324, 0.308],
'2018-12-31_NDRE': [0.411, 0.202, 0.411, 0.463, 0.345],
'2018-12-31_NDVI': [0.592, 0.294, 0.588, 0.622, 0.508],
'2019-01-05_NDRE': [0.378, 0.207, 0.431, 0.495, 0.321],
'2019-01-05_NDVI': [0.576, 0.293, 0.621, 0.668, 0.432],
}
df = pd.DataFrame(data)
results = (
df
.melt(id_vars='farm_Id')
.assign(
typee = lambda df_: df_['variable'].str.split('_').str[1],
variable = lambda df_: df_['variable'].str.split('_').str[0]
)
.astype({'typee': 'category', 'variable': 'datetime64[ns]'})
.rename(columns={'variable': 'date'})
.query('typee.isin(["NDRE", "NDVI"])')
.groupby(['date', 'typee'])
.value.mean()
.to_frame()
)
results.head(10)
Approach2:: Following your proposed approach in the comments, you could complete your solution as follows (a pd.rename
would resolve the columns naming after obtaining the results you need):
df = pd.DataFrame(data)
df.set_index('farm_Id', inplace=True)
multi_columns = pd.MultiIndex.from_tuples([(col.split('_')[0], col.split('_')[1]) for col in df.columns])
df.columns = multi_columns
result = (
df
.unstack(level=0)
.reset_index()
.groupby(['level_0', 'level_1'])
[0].mean()
.to_frame()
)
result
NOTE: level_0 and level_1 are the date and type after extraction.
This will get you the following output in both cases:
Date | Type | Value |
---|---|---|
2018-12-16 | NDRE | 0.4148 |
NDVI | 0.5748 | |
2018-12-21 | NDRE | 0.3708 |
NDVI | 0.5350 | |
2018-12-26 | NDRE | 0.2534 |
NDVI | 0.3556 | |
2018-12-31 | NDRE | 0.3664 |
NDVI | 0.5208 | |
2019-01-05 | NDRE | 0.3664 |
NDVI | 0.5180 |
I hope this helps!