I am trying to split imported csv files (timeseries) and manipulated them, using pandas MultiIndex Slicer command .xs()
. The following df
replicates the structure of my imported csv file.
import pandas as pd
df = pd.DataFrame(
{'Sensor ID': [14,1,3,14,3],
'Building ID': [109,109,109,109,109],
'Date/Time': ["26/10/2016 14:31:14","26/10/2016 14:31:16", "26/10/2016 14:32:17", "26/10/2016 14:35:14", "26/10/2016 14:35:38"],
'Reading': [20.95, 20.62, 22.45, 20.65, 22.83],
})
df.set_index(['Sensor ID','Date/Time'], inplace=True)
df.sort_index(inplace=True)
print(df)
SensorList = [1, 3, 14]
for s in SensorList:
df1 = df.xs(s, level='Sensor ID')
I have tested the code on a small excerpt of csv data and it works fine. However, when implementing with the entire csv file, I receive the error: ValueError: Length mismatch: Expected axis has 19562 elements, new values have 16874 elements
.
Printing df.info()
returns the following:
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 65981 entries, (1, 2016-10-26 14:35:15) to (19, 2016-11-07 11:27:14)
Data columns (total 2 columns):
Building ID 65981 non-null int64
Reading 65981 non-null float64
dtypes: float64(1), int64(1)
memory usage: 1.5+ MB
None
Any tip on what may be causing the error?
EDIT
I inadvertently truncated my code, thus leaving it pointless in its current form. The original code resamples values into 15-minutes and 1-hour intervals.
with:
units = ['D1','D3','D6','D10']
unit_output_path = './' + unit + '/'
the loop does:
for s in SensorList:
## Slice multi-index to isolate all readings for sensor s
df1 = df_mi.xs(s, level='Sensor ID')
df1.drop('Building ID', axis=1, inplace=True)
## Resample by 15min and 1hr intervals and exports individual csv files
df1_15min = df1.resample('15Min').mean().round(1)
df1_hr = df1.resample('60Min').mean().round(1)
Traceback:
File "D:\AN6478\AN6478_POE_ABo.py", line 52, in <module>
df1 = df_mi.xs(s, level='Sensor ID')
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1736, in xs
setattr(result, result._get_axis_name(axis), new_ax)
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2685, in __setattr__
return object.__setattr__(self, name, value)
File "pandas\src\properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas\lib.c:44748)
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 428, in _set_axis
self._data.set_axis(axis, labels)
File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\internals.py", line 2635, in set_axis
(old_len, new_len))
ValueError: Length mismatch: Expected axis has 19562 elements, new values have 16874 elements
I can't tell you why exactly df1 = df_mi.xs(s, level='Sensor ID')
raises the ValueError here. Where does df_mi
come from?
Here is an alternative using groupby
which accomplishes what you want on your given dummy data frame without relying on multiIndex and xs
. :
# reset index to have DatetimeIndex, otherwise resample won't work
df = df.reset_index(0)
df.index = pd.to_datetime(df.index)
# create data frame for each sensor, keep relevant "Reading" column
grouped = df.groupby("Sensor ID")["Reading"]
# iterate each sensor data frame
for sensor, sub_df in grouped:
quarterly = sub_df.resample('15Min').mean().round(1)
hourly = sub_df.resample('60Min').mean().round(1)
# implement your to_csv saving here
Note, you could also use the groupby
on the multiIndex with df.groupby(level="Sensor ID")
, however since you want to resample later on, it is easier to drop Sensor ID from the multiIndex which simplifies it overall.