Search code examples
pythonpandasdatetimedata-cleaning

Data cleaning - ifelse statement


I'm trying to clean up the below dataframe so that there is only one value for each date. The data changes daily based on the date. So if today was 01/01/2020 the data would show the following;

            Value   Type
01/01/2010  38.3    Forecast
01/01/2020  31.85   Actual
01/01/2020  6.45    Delta
02/01/2010  31.08   Actual
03/01/2020  29      Forecast
04/01/2020  23.4    Forecast
05/01/2020  24.5    Forecast
06/01/2020  19.4    Forecast
07/01/2020  21.1    Forecast
08/01/2020  22.3    Forecast
09/01/2020  25.6    Forecast

I need to try and clean this dataframe so that if a date has an 'actual' value that value is used and if not than the forecasted value is used.

I have been using the below to pull out the forecast but than I am missing the accuracy of having the 'actual' value where it is available and then this would also exclude D+1 as there is no forecasted value.

sel = ['Forecast'] 
df = df.loc[df['Type'].isin(sel)]

The end results would go something like;


            Value   Type
01/01/2020  31.85   Actual
02/01/2010  31.08   Actual
03/01/2020  29      Forecast
04/01/2020  23.4    Forecast
05/01/2020  24.5    Forecast
06/01/2020  19.4    Forecast
07/01/2020  21.1    Forecast
08/01/2020  22.3    Forecast
09/01/2020  25.6    Forecast

Any help much appreciated!


Solution

  • Perhaps this is not the best way, given as I'm not overly familiar on pandas groupby transform/aggregate but this works for me.

    a =  df.groupby("Date").apply(lambda x: x[x["Type"]=="Actual"] if (x["Type"]=="Actual").any() else x[x["Type"]=="Forecast"]).reset_index(drop=True)