To find monthly average ndre and ndvi values

So, I have this data,

data = {
    'farm_Id': ['farm_258', 'farm_344', 'farm_345', 'farm_346', 'farm_348'],
    '2018-12-16_NDRE': [0.406, 0.380, 0.449, 0.432, 0.407],
    '2018-12-16_NDVI': [0.581, 0.552, 0.619, 0.573, 0.549],
    '2018-12-21_NDRE': [0.417, 0.270, 0.387, 0.403, 0.377],
    '2018-12-21_NDVI': [0.605, 0.416, 0.567, 0.555, 0.532],
    '2018-12-26_NDRE': [0.338, 0.191, 0.267, 0.252, 0.219],
    '2018-12-26_NDVI': [0.480, 0.285, 0.381, 0.324, 0.308],
    '2018-12-31_NDRE': [0.411, 0.202, 0.411, 0.463, 0.345],
    '2018-12-31_NDVI': [0.592, 0.294, 0.588, 0.622, 0.508],
    '2019-01-05_NDRE': [0.378, 0.207, 0.431, 0.495, 0.321],
    '2019-01-05_NDVI': [0.576, 0.293, 0.621, 0.668, 0.432],
}

I'm trying to find the monthly average values for ndre and ndvi.

So far have tried the following code.

import pandas as pd

df = pd.DataFrame(data)
df.set_index('farm_Id', inplace=True)
# Extract the month from column names
df.columns = pd.to_datetime(df.columns, format='%Y-%m-%d').strftime('%Y-%m-%d')
# Group by month and calculate the mean
monthly_average = df.groupby(df.columns, axis=1).mean()
# Print the result
monthly_average

But, there is a format error for date (Error: time data "2018-12-16_NDRE" at position 0 doesn't match format specified). No matter how I try unable to fix the error and achive result. Please suggest the right method.

Solution

You could use pd.melt to get your dataframe in vertical schema, then use str.split to separate the types (in your case NDVI and NDRE) and the date.
then you filter by type (since you want data related only to NDRE and NDVI) and you do a usual groupby and get the mean value.
Approach1:: here is a sample code using your sample dataset:

import pandas as pd
data = {
    'farm_Id': ['farm_258', 'farm_344', 'farm_345', 'farm_346', 'farm_348'],
    '2018-12-16_NDRE': [0.406, 0.380, 0.449, 0.432, 0.407],
    '2018-12-16_NDVI': [0.581, 0.552, 0.619, 0.573, 0.549],
    '2018-12-21_NDRE': [0.417, 0.270, 0.387, 0.403, 0.377],
    '2018-12-21_NDVI': [0.605, 0.416, 0.567, 0.555, 0.532],
    '2018-12-26_NDRE': [0.338, 0.191, 0.267, 0.252, 0.219],
    '2018-12-26_NDVI': [0.480, 0.285, 0.381, 0.324, 0.308],
    '2018-12-31_NDRE': [0.411, 0.202, 0.411, 0.463, 0.345],
    '2018-12-31_NDVI': [0.592, 0.294, 0.588, 0.622, 0.508],
    '2019-01-05_NDRE': [0.378, 0.207, 0.431, 0.495, 0.321],
    '2019-01-05_NDVI': [0.576, 0.293, 0.621, 0.668, 0.432],
}

df = pd.DataFrame(data)

results = (
    df
    .melt(id_vars='farm_Id')
    .assign(
        typee = lambda df_: df_['variable'].str.split('_').str[1],
        variable = lambda df_: df_['variable'].str.split('_').str[0]
    )
    .astype({'typee': 'category', 'variable': 'datetime64[ns]'})
    .rename(columns={'variable': 'date'})
    .query('typee.isin(["NDRE", "NDVI"])')
    .groupby(['date', 'typee'])
    .value.mean()
    .to_frame()
)
results.head(10)

Approach2:: Following your proposed approach in the comments, you could complete your solution as follows (a pd.rename would resolve the columns naming after obtaining the results you need):

df = pd.DataFrame(data) 
df.set_index('farm_Id', inplace=True) 
multi_columns = pd.MultiIndex.from_tuples([(col.split('_')[0], col.split('_')[1]) for col in df.columns])
df.columns = multi_columns
result = (
    df
    .unstack(level=0)
    .reset_index()
    .groupby(['level_0', 'level_1'])
    [0].mean()
    .to_frame()
)
result

NOTE: level_0 and level_1 are the date and type after extraction.

This will get you the following output in both cases:

Date	Type	Value
2018-12-16	NDRE	0.4148
	NDVI	0.5748
2018-12-21	NDRE	0.3708
	NDVI	0.5350
2018-12-26	NDRE	0.2534
	NDVI	0.3556
2018-12-31	NDRE	0.3664
	NDVI	0.5208
2019-01-05	NDRE	0.3664
	NDVI	0.5180

I hope this helps!