I have a huge panda df with hourly data from years 1991-2021 and I need to drop all rows with year != 2021 or the current year. In my dataframe there is a column "year" with years ranging from 1991-2021 of hourly data. I am using this line of code below but it does not seem to be doing anything for dataframe df1. Is there a better way to delete all rows that do not equal year == 2021?:
trimmed_df1 = df1.drop(df1[df1.year != '2021'].index)
My data is a 4532472 X 10 column df in this format:
df1.columns.values
Out[20]:
array(['plant_name', 'business_name', 'business_code',
'maint_region_name', 'power_kwh', 'wind_speed_ms', 'mos_time',
'dataset', 'month', 'year'], dtype=object)
This should do the job:
>>> trimmed_df1 = df1.query(‘year != 2021’).reset_index()
Maybe you don’t even need to reset the index - it’s up to you.