Search code examples
pythonpandastime-series

Trying to check data frequency with Pandas Series of datetime64 objects


I have some time series data that can be 1Hz, 10Hz, or 100Hz. the file I load in happens to be 1Hz:

In [6]: data = pd.read_csv("ftp.csv")

In [7]: data.Time
Out[7]: 
0             NaN
1     11:30:08 AM
2     11:30:09 AM
3     11:30:10 AM
4     11:30:11 AM
5     11:30:12 AM
6     11:30:13 AM

I convert it to datetime with:

In [8]: time = pd.to_datetime(data.Time)

In [9]: time
Out[9]: 
0                    NaT
1    2015-03-03 11:30:08
2    2015-03-03 11:30:09
3    2015-03-03 11:30:10
4    2015-03-03 11:30:11
5    2015-03-03 11:30:12

From here how can I verify what the sampling frequency is? Do I have to do this manually or can I use a built in pandas method?


Solution

  • One method after converting to datetime64, if frequency sampling rate is the same then we could call diff() to calculate the difference between all rows which should be the same and compare this with a np.timedelta64 type, so for your sample data this would be:

    In [277]:
    
    all(df.datetime.diff()[1:] == np.timedelta64(1, 's')) == True
    Out[277]:
    True
    
    In [278]:
    
    df.datetime.diff()
    Out[278]:
    0
    1        NaT
    2   00:00:01
    3   00:00:01
    4   00:00:01
    5   00:00:01
    6   00:00:01
    Name: datetime, dtype: timedelta64[ns]
    In [279]:
    
    df.datetime.diff()[1:] == np.timedelta64(1, 's')
    Out[279]:
    0
    2    True
    3    True
    4    True
    5    True
    6    True
    Name: datetime, dtype: bool
    

    to check if the freq was 10hz or 100hz just change the units to np.timedelta64 so for 10hz: np.timedelta64(100, 'ms') and for 100hz: np.timedelta64(10, 'ms')

    The np.timedelta64 units can be found here: http://docs.scipy.org/doc/numpy/reference/arrays.datetime.html#datetime-and-timedelta-arithmetic