I have a dataset with a date range from January 12th to August 3rd 2018 with some values:
The dimensionality of df_luminosidad
DataFrame is:
df_luminosidad.shape
(9752, 2)
Each row of data has a half-hour frequency. The first row begins at 2018-01-12
df_luminosidad.iloc[0]
Fecha: 2018-01-12 00:17:28
Luz (lux) 1
Name: 0, dtype: object
And the last row ends at 2018-08-03
df_luminosidad.tail(1)
Fecha: Luz (lux)
9751 2018-08-03 23:44:59 1
There are many rows which correspond to one day such as denoted above.
I want to create a new dataframe selecting only 1 day as a value for the Fecha:
column (no a repeated date) and select the average value from all existing value for the selected date of the Luz(lux) column
This would look like this:
| Fecha: | Luz(lux) - Average each day values |
| 2018-01-12 | 9183.479167 |
| 2018-01-13 | 7431.8125 |
| 2018-01-14 | 11073.1875 |
| 2018-01-15 | 11456.25 |
| . . |
| . . |
| . . |
| 2018-08-03 | 11331.229167 |
I am using pd.date_range()
function to create a dataframe with the values from df_luminosidad
previous dataframe to a specified frequency such as denoted @piRSquared in this answer
From January 12 to August 03 there are 203 days (28 weeks), then I choose 204 like period
attribute, and I am using a freq
attribute calendar day frequency D
df = pd.DataFrame(dict(
Date=pd.date_range('2018-01-12', periods=204, freq='D'),
Value=df_luminosidad['Luz (lux)'].mean()
))
Until here, the approach is well, but I have a doubt about of how to compute the mean of Luz (lux)
values to each day selected with pd.date_range
, because, at this moment I just get an average same value for all days from 2018-01-12
until 2018-08-03
such as follow:
| Date | Value |
| 2018-01-12 | 11228.888331 |
| 2018-01-13 | 11228.888331 |
| 2018-01-14 | 11228.888331 |
| 2018-01-15 | 11228.888331 |
| . . |
| . . |
| . . |
| 2018-08-03 | 11331.229167 |
I've already generated a dataframe by each Fecha:
column values, and separately I could get their average, but this force me to read
each file day individually.
How can I generate of a cyclic way grouping all values of a day into a only average value by each day and put them in one dataframe?
I believe need resample
with mean
or aggregate mean
with Grouper
:
df_luminosidad['Fecha:'] = pd.to_datetime(df_luminosidad['Fecha:'])
df = df_luminosidad.resample('D', on='Fecha:')['Luz (lux)'].mean().reset_index()
Or:
df = (df_luminosidad.groupby(pd.Grouper(key='Fecha:', freq='D'))['Luz (lux)']
.mean()
.reset_index())
Another solutions with DatetimeIndex
:
df_luminosidad['Fecha:'] = pd.to_datetime(df_luminosidad['Fecha:'])
df_luminosidad = df_luminosidad.set_index('Fecha:')
df = df_luminosidad.resample('D')['Luz (lux)'].mean().reset_index()
df = df_luminosidad.groupby(pd.Grouper(freq='D'))['Luz (lux)'].mean().reset_index()