Search code examples
pythonfftcontinuous-fourier

How to find the most common frequeny in Time series


I have a TimeSeries shown below. We can clearly see that there is a cyclical behavior in the data, where rise and fall happens every 12 months and so on, irrespective of if trend is rising/falling. I am struggling to find how to extract this frequency of 12 from the data using scipy.fft python library. Mostly every time repetition happens around 12 months, but it can be around 11 months as well or 13 months, but 12 is the most common frequency. I am using using Fourier transformation, but can't figure out how to extract this most common frequency (12) using some python library?

Thanks

enter image description here

Here is the data:

11130.578853063385,
6509.723592808,
5693.928982796129,
3415.099921325464,
-9299.291673374173,
-3388.284173885658,
-5577.9316298032745,
-3509.583232678111,
2285.99065857961,
3844.3061166856014,
-7383.526882168155,
-4622.792125020905,
2813.586128745183,
-1501.9405075290495,
8911.954971658348,
7800.444458471794,
-1190.7952377053866,
4768.791467877524,
2173.593871988719,
-2420.04786197912,
-2304.842777539399,
-3562.1788525161837,
-8766.208378658988,
-7655.936603573945,
-5890.9298543619125,
-9628.764012284291,
12124.740839506767,
12391.257220312522,
7512.253051850619,
12921.032383220418,
10063.270097922614,
-1350.2599176773563,
-6887.434936788105,
-11116.26528794868,
-10196.871058658888,
-10874.172006461778,
-15014.190086779208,
-17837.744786550902,
15235.434771455053,
17183.25815161994,
16835.95193044929,
21063.986176551374,
17987.99577807288,
-270.6290142721815,
-11239.957979217992,
-18724.854251002133,
-11752.820614075652,
-14332.597031388648,
-24609.22398631297,
-26155.98046224267,
18192.356438131075,
22165.14150786262,
26758.419290443217,
29284.65841543005,
25762.928479462924,
865.3393849464444,
-15121.264730579132,
-26306.45361387036,
-13494.286360139175,
-18089.58324839494,
-34738.184049794625,
-34718.87495981627,
21145.112760626133,
27322.030709198487,
37252.78168890166,
37846.98231395838,
33206.62103950547,
2092.870600583023,
-18537.521405900694,
-33955.48182565038,
-15445.551953312479,
-22284.152196532646,
-45880.94206326016,
-44229.92788481257,
24988.038646046363,
32958.71017047145,
49117.93320642349,
47304.760779654374,
40776.01828187993,
3403.573579093782,
-22402.79273128273,
-42361.96378730598,
-17190.060741565456,
-27378.2527904574,
-59155.49212555031,
-60122.10588005664,
26272.133100405994,
44887.192435244986,
69002.74742137044,
59037.928523261784,
42122.51604012313,
6075.663868325184,
-20631.710295791454,
-48088.66531781877,
-23396.29341809641,
-40847.479839729145,
-68317.87342502769,
-73679.4424942532,
28302.69374713241,
57321.16868946109,
83820.10748232565,
68399.66173487401,
44989.374076533895,
8830.088516704302,
-18149.500187183363,
-52028.5021898363,
-31013.963236266634,
-53956.5249205745,
-77250.59604604884,
-86642.45203443282,
30541.62328593645,
69812.47143595785,
98233.7834300242,
77385.915451272,
48189.69475295938,
11504.22579592029,
-15251.799652343976,
-55879.292898282,
-38956.992207762654,
-67210.9936142441,
-86636.69916492153,
-99845.12467446178,
32751.253099701484,
82656.01928819218,
113259.2399845611,
86532.20966362985,
51019.20889397171,
14289.09297146163,
-11777.371574335935,
-59627.30976102835,
-47170.18721199697,
-81027.36407627042,
-96178.09587995053,
-113526.93736260894,
34817.23859755824,
95927.57143777516,
128782.84687524068,
95920.65382048927,
53226.62965224956,
17272.000877533148,
-7716.869736424804,
-63110.06727848651,
-55696.68126167806,
-95538.60898488457,
-105325.08525283691,
-127600.17956244369,
36734.97589442811,
109601.51109750797,
144205.71977383518,
105517.48123365057,
54793.814706888734,
20380.77940730315,
-3119.1108027357986,
-66153.73274186133,
-64702.85998743505,
-110650.72884973585

Solution

  • Original time series wasn't provided, so I made some up. EDIT: newly-supplied data provided at the end of the post

    Take the magnitude of the DFT using numpy.fft.fft or, for real data as here, numpy.fft.rfft, and calculate its amplitude. (If your data isn't equally spaced in time you can interpolate first.)

    You can find peaks in the frequency trace, but note that there can be a lot of energy in low-frequency components, so you may need to exclude these.

    Note that the frequency interval is 2.pi/tmax in rad/time_unit, or 1/tmax in cycles/time_unit, so this limits the accuracy you will get.

    import numpy as np
    import matplotlib.pyplot as plt
    
    T = 150
    N = 10000                                          # For accurate frequency, N and number of cycles must be large
    t = np.linspace( 0.0, T, N, endpoint=False )
    signal = 75000 * ( t / 150 ) ** 1.5 * ( 1 + 0.8 * np.sin( 2 * np.pi * t / 12 ) + 0.4 * np.sin( 2 * np.pi * t / 4 ) )
    
    dw = 2 * np.pi / T                                 # Frequency interval in rad/time_unit
    df = 1 / T                                         # Frequency interval in cycles/time_unit
    dt = T / N
    
    FT = np.fft.rfft( signal )                         # Use rfft for a real signal (note: only about half the components)
    A = np.abs( FT ) / N                               # Normalised magnitude of Fourier components
    w = np.linspace( 0.0, N * dw, N, endpoint=False )  # Frequency values in rad/time_unit
    
    low = 4                                            # Number of low-frequency components to exclude
    m = low + np.argmax( A[low:] )                     # Index of dominant frequency (excluded low frequencies)
    omega = m * dw                                     # Dominant frequency in rad/s
    f = omega / ( 2 * np.pi )                          # Dominant frequency in Hz
    #f = np.fft.rfftfreq( N, dt )[m]                   # Alternative, direct from numpy.fft
    
    print( 'Index:', m )
    print( 'Frequency interval in rad/time_unit    (dw):', dw    )
    print( 'Dominant frequency in rad/time_unit    (w) :', omega )
    print( 'Frequency interval in cycles/time_unit (df):', df    )
    print( 'Dominant frequency in cycles/time_unit (f) :', f     )
    
    plt.title( "Time Domain" )
    plt.plot( t, signal, 'b-' )
    plt.xlabel( 't' )
    plt.show()
    
    plt.title( "Frequency Domain" )
    plt.plot( w[:len(A)] / ( 2 * np.pi ), A, 'r-' )
    plt.xlim( 0.0, 0.6 )
    plt.xlabel( 'f' )
    plt.show()
    

    Output:

    enter image description here

    enter image description here

    Index: 12
    Frequency interval in rad/time_unit    (dw): 0.041887902047863905
    Dominant frequency in rad/time_unit    (w) : 0.5026548245743668
    Frequency interval in cycles/time_unit (df): 0.006666666666666667
    Dominant frequency in cycles/time_unit (f) : 0.07999999999999999
    

    A time period of 12 corresponds to an f-frequency of 1/12=0.083. Accuracy is limited by df.

    EDIT: Supplied Data

    Assuming the data is put, as-is, in a file "signal.dat", change the first part of the code to

    import numpy as np
    import matplotlib.pyplot as plt
    
    signal = np.loadtxt( 'signal.dat', delimiter=',', usecols=0, unpack=True )
    N = len(signal)
    T = N
    t = np.linspace( 0.0, T, N, endpoint=False )
    dw = 2 * np.pi / T                                 # Frequency interval in rad/time_unit
    df = 1 / T                                         # Frequency interval in cycles/time_unit
    dt = T / N                                         # Time interval (will be 1 month)
    

    Then dt = 1 (month), whilst N and T are (numerically) the number of months in the record.

    The output gives f=0.083 as the maximum frequency, corresponding to a period of 1/f=12 months.

    Index: 13
    Frequency interval in rad/time_unit    (dw): 0.04027682889217683
    Dominant frequency in rad/time_unit    (w) : 0.5235987755982988
    Frequency interval in cycles/time_unit (df): 0.00641025641025641
    Dominant frequency in cycles/time_unit (f) : 0.08333333333333333
    

    enter image description here

    enter image description here