Search code examples
pythonalgorithmrssfrequencytime-frequency

Existing library/algorithm for episodic frequency detection and prediction in a time series?


I'm working with podcast RSS feeds in Python. Are there any existing libraries or algorithms to detect and predict periodic release schedules, given a series in time?

For example, if five items in an RSS feed had the following timestamps:

Fri, 20 Nov 2020 02:16:14 +0000
Fri, 13 Nov 2020 17:51:58 +0000
Fri, 6 Nov 2020 03:08:04 +0000
Fri, 30 Oct 2020 19:09:29 +0000
Fri, 23 Oct 2020 01:23:10 +0000

is there an algorithm to determine "Weekly on Fridays"? Or if they were:

Tue, 24 Nov 2020 10:00:00 -0000
Fri, 20 Nov 2020 09:00:00 -0000
Tue, 17 Nov 2020 10:00:00 -0000
Fri, 13 Nov 2020 10:00:00 -0000
Tue, 10 Nov 2020 10:00:00 -0000

to determine "Twice a week, next episode Friday the 27th"? I believe Pocket Casts has a feature like this, but it remains proprietary.


Solution

  • For easy ones you can use pd.infer_freq in this way

    import numpy as np
    import pandas as pd
    
    date_range = [
        "Fri, 20 Nov 2020",
        "Fri, 13 Nov 2020",
        "Fri, 6 Nov 2020",
        "Fri, 30 Oct 2020",
        "Fri, 23 Oct 2020"]
    
    date_range_2 = [
        "Tue, 24 Nov 2020",
        "Fri, 20 Nov 2020",
        "Tue, 17 Nov 2020",
        "Fri, 13 Nov 2020",
        "Tue, 10 Nov 2020"]
    
    def get_frequency(date_range):
        ts = pd.Series(index=date_range)
        return pd.infer_freq(ts.index)
    
    print(f"First Time Series: {get_frequency(date_range)}")
    print(f"Second Time Series: {get_frequency(date_range_2)}")
    

    Giving you no output for the second, but for the first one

    First Time Series: -1W-FRI
    Second Time Series: None