Search code examples
pythonpandasdataframedatemean

To find monthly average ndre and ndvi values


So, I have this data,

data = {
    'farm_Id': ['farm_258', 'farm_344', 'farm_345', 'farm_346', 'farm_348'],
    '2018-12-16_NDRE': [0.406, 0.380, 0.449, 0.432, 0.407],
    '2018-12-16_NDVI': [0.581, 0.552, 0.619, 0.573, 0.549],
    '2018-12-21_NDRE': [0.417, 0.270, 0.387, 0.403, 0.377],
    '2018-12-21_NDVI': [0.605, 0.416, 0.567, 0.555, 0.532],
    '2018-12-26_NDRE': [0.338, 0.191, 0.267, 0.252, 0.219],
    '2018-12-26_NDVI': [0.480, 0.285, 0.381, 0.324, 0.308],
    '2018-12-31_NDRE': [0.411, 0.202, 0.411, 0.463, 0.345],
    '2018-12-31_NDVI': [0.592, 0.294, 0.588, 0.622, 0.508],
    '2019-01-05_NDRE': [0.378, 0.207, 0.431, 0.495, 0.321],
    '2019-01-05_NDVI': [0.576, 0.293, 0.621, 0.668, 0.432],
}

I'm trying to find the monthly average values for ndre and ndvi.

So far have tried the following code.

import pandas as pd

df = pd.DataFrame(data)
df.set_index('farm_Id', inplace=True)
# Extract the month from column names
df.columns = pd.to_datetime(df.columns, format='%Y-%m-%d').strftime('%Y-%m-%d')
# Group by month and calculate the mean
monthly_average = df.groupby(df.columns, axis=1).mean()
# Print the result
monthly_average

But, there is a format error for date (Error: time data "2018-12-16_NDRE" at position 0 doesn't match format specified). No matter how I try unable to fix the error and achive result. Please suggest the right method.


Solution

  • You could use pd.melt to get your dataframe in vertical schema, then use str.split to separate the types (in your case NDVI and NDRE) and the date.
    then you filter by type (since you want data related only to NDRE and NDVI) and you do a usual groupby and get the mean value.
    Approach1:: here is a sample code using your sample dataset:

    import pandas as pd
    data = {
        'farm_Id': ['farm_258', 'farm_344', 'farm_345', 'farm_346', 'farm_348'],
        '2018-12-16_NDRE': [0.406, 0.380, 0.449, 0.432, 0.407],
        '2018-12-16_NDVI': [0.581, 0.552, 0.619, 0.573, 0.549],
        '2018-12-21_NDRE': [0.417, 0.270, 0.387, 0.403, 0.377],
        '2018-12-21_NDVI': [0.605, 0.416, 0.567, 0.555, 0.532],
        '2018-12-26_NDRE': [0.338, 0.191, 0.267, 0.252, 0.219],
        '2018-12-26_NDVI': [0.480, 0.285, 0.381, 0.324, 0.308],
        '2018-12-31_NDRE': [0.411, 0.202, 0.411, 0.463, 0.345],
        '2018-12-31_NDVI': [0.592, 0.294, 0.588, 0.622, 0.508],
        '2019-01-05_NDRE': [0.378, 0.207, 0.431, 0.495, 0.321],
        '2019-01-05_NDVI': [0.576, 0.293, 0.621, 0.668, 0.432],
    }
    
    df = pd.DataFrame(data)
    
    results = (
        df
        .melt(id_vars='farm_Id')
        .assign(
            typee = lambda df_: df_['variable'].str.split('_').str[1],
            variable = lambda df_: df_['variable'].str.split('_').str[0]
        )
        .astype({'typee': 'category', 'variable': 'datetime64[ns]'})
        .rename(columns={'variable': 'date'})
        .query('typee.isin(["NDRE", "NDVI"])')
        .groupby(['date', 'typee'])
        .value.mean()
        .to_frame()
    )
    results.head(10)
    

    Approach2:: Following your proposed approach in the comments, you could complete your solution as follows (a pd.rename would resolve the columns naming after obtaining the results you need):

    df = pd.DataFrame(data) 
    df.set_index('farm_Id', inplace=True) 
    multi_columns = pd.MultiIndex.from_tuples([(col.split('_')[0], col.split('_')[1]) for col in df.columns])
    df.columns = multi_columns
    result = (
        df
        .unstack(level=0)
        .reset_index()
        .groupby(['level_0', 'level_1'])
        [0].mean()
        .to_frame()
    )
    result
    

    NOTE: level_0 and level_1 are the date and type after extraction.

    This will get you the following output in both cases:

    Date Type Value
    2018-12-16 NDRE 0.4148
    NDVI 0.5748
    2018-12-21 NDRE 0.3708
    NDVI 0.5350
    2018-12-26 NDRE 0.2534
    NDVI 0.3556
    2018-12-31 NDRE 0.3664
    NDVI 0.5208
    2019-01-05 NDRE 0.3664
    NDVI 0.5180

    I hope this helps!