Search code examples
pythonnumpyscipytime-seriescross-correlation

How can I calculate the time lag between two similar time series?


I'm trying to compute/visualize the time lag between 2 time series (I want to know the time lag between the humidity progression of outside and inside a room).

Each data point of my series was taken hourly. Plotting the 2 series together, I can clearly see a shift between them: Sorry for hiding the axis

Here are a part of my time series data. I will pack them in 2 arrays:



inside_humidity = 
[11.77961297, 11.59755268, 12.28761522, 11.88797553, 11.78122077, 11.5694668,
 11.70421932, 11.78122077, 11.74272005, 11.78122077, 11.69438733, 11.54126933,
 11.28460592, 11.05624965, 10.9611012,  11.07527934, 11.25417308, 11.56040908,
 11.6657186,  11.51171572, 11.49246536, 11.78594142, 11.22968373, 11.26840678,
 11.26840678, 11.29447992, 11.25553344, 11.19711371, 11.17764047, 11.11922075,
 11.04132778, 10.86996123, 10.67410607, 10.63493504, 10.74922916, 10.74922916,
 10.6294765,  10.61011497, 10.59075345, 10.80373021, 11.07479154, 11.15223764,
 11.19711371, 11.17764047, 11.15816723, 11.22250051, 11.22250051, 11.202915,
 11.18332948, 11.16374396, 11.14415845, 11.12457293, 11.10498742, 11.14926578,
 11.16896413, 11.16896413, 11.14926578, 10.8307902,  10.51742195, 10.28187137,
 10.12608544,  9.98977276,  9.62267727,  9.31289289,  8.96438546,  8.77077022,
  8.69332413,  8.51907042,  8.30609366,  8.38353975,  8.4513867,   8.47085994,
  8.50980642,  8.52927966,  8.50980642,  8.55887037,  8.51969934,  8.48052831,
  8.30425867,  8.2177078,   7.98402891,  7.92560918,  7.89950166,  7.83489682,
  7.75789537,  7.5984808,   7.28426807,  7.39778913,  7.71943214,  8.01149931,
  8.18276652,  8.23009255,  8.16215295,  7.93822471,  8.00350215,  7.93843482,
  7.85072729,  7.49778011,  7.31782649,  7.29862668,  7.60162032,  8.29665484,
  8.58797834,  8.50011383,  8.86757784,  8.76600556,  8.60491125,  8.4222628,
  8.24923231,  8.14470714,  8.17351638,  8.52530093,  8.72220151,  9.26745883,
  9.1580007,   8.61762692,  8.22187405,  8.43693644,  8.32414835,  8.32463974,
  8.46833012,  8.55865487,  8.72647164,  9.04112806,  9.35578449,  9.59465974,
 10.47339785, 11.07218093, 10.54091351, 10.56138918, 10.46099958, 10.38129168,
 10.16434831, 10.10612612, 10.009246,   10.53502351, 10.8307902,  11.13420052,
 11.64337309, 11.18958511, 10.49630791, 10.60856932, 10.37029108,  9.86281478,
  9.64699826,  9.95341012, 10.24329812, 10.6848196,  11.47604231, 11.30505352,
 10.72194974, 10.30058448, 10.05022037, 10.06318411,  9.90118897,  9.68530059,
  9.47790657,  9.48585784,  9.61639418,  9.86244265, 10.29009361, 10.28297229,
 10.32073088, 10.65389513, 11.09656351, 11.20188562, 11.24124169, 10.40503955,
  9.74632512,  9.07606098,  8.85145589,  9.37080152,  9.65082743, 10.0707891,
 10.68776091, 11.25879751, 11.0416348,  10.89558456, 10.7908258,  10.66539685,
 10.7297755,  10.77571398, 10.9268264,  11.16021492, 11.60961709, 11.43827534,
 11.96155427, 12.16116437, 12.80412266, 12.52540805, 11.96752965, 11.58099292]

outside_humidity = 
[10.17449206, 10.4823292,  11.06818167, 10.82768699, 11.27582592, 11.4196233,
 10.99393027, 11.4122507,  11.18192837, 10.87247831, 10.68664321, 10.37949651,
  9.57155882, 10.86611665, 11.62547196, 11.32004266, 11.75537602, 11.51292063,
 11.03107569, 10.7297755,  10.4345622,  10.61271497,  9.49271162, 10.15594248,
  9.99053828,  9.80915398,  9.6452438,  10.06900573, 11.18075689, 11.8289847,
 11.83334752, 11.27480708, 11.14370467, 10.88149985, 10.73930381, 10.7236597,
 10.26210496, 11.01260226, 11.05428228, 11.58321342, 12.70523808, 12.5181118,
 11.90023799, 11.67756426, 11.28859471, 10.86878222,  9.73984486, 10.18253902,
  9.80915398, 10.50980784, 11.38673459, 11.22751685, 10.94171823, 10.56484228,
 10.38220753, 10.05388847,  9.96147203,  9.90698862,  9.7732203,   9.85262125,
  8.7412938,   8.88281702,  8.07919545,  8.02883587,  8.32341424,  8.07357711,
  7.27302616,  6.73660684,  6.66722819,  7.29408637,  7.00046542,  6.46322019,
  6.07150988,  6.00207234,  5.8818402,   6.82443881,  7.20212882,  7.52167696,
  7.88857771,  8.351627,    8.36547023,  8.24802846,  8.18520693,  7.92420816,
  7.64926024,  7.87944972,  7.82118727,  8.02091833,  7.93071882,  7.75789457,
  7.5416447,   6.94430133,  6.65907535,  6.67454591,  7.25493614,  7.76939457,
  7.55357806,  6.61479472,  7.17641357,  7.24664082,  8.62732387,  8.66913548,
  8.70925667,  9.0477017,   8.24558224,  8.4330502,   8.44366397,  8.17995798,
  8.1875752,   9.33296518,  9.66567041,  9.88581085,  8.95449382,  8.3587624,
  9.20584448,  8.90605388,  8.87494884,  9.12694892,  8.35055177,  7.91879933,
  7.78867253,  8.22800878,  9.03685287, 12.49630018, 11.11819755, 10.98869374,
 10.65897176, 10.36444573, 10.052609,   10.87627021, 10.07379564, 10.02233847,
  9.62022856, 11.21575473, 10.85483543, 11.67324627, 11.89234248, 11.10068132,
 10.06942096,  8.50405894,  8.13168561,  8.83616476,  8.35675085,  8.33616802,
  8.35675085,  9.02209801,  9.5530404,   9.44738836, 10.89645958, 11.44771721,
 11.79943601, 10.7765335,  11.1453622,  10.74874776, 10.55195175, 10.34494483,
  9.83813522, 11.26931785, 11.20641798, 10.51555027, 10.90808954, 11.80923545,
 11.68300879, 11.60313809,  7.95163365,  7.77213815,  7.54209557,  7.30603673,
  7.17842173,  8.25899805,  8.56494995, 10.44245578, 11.08542758, 11.74129079,
 11.67979686, 12.94362214, 11.96285343, 11.8289847,  11.01388413, 10.6793698,
 11.20662595, 11.97684701, 12.46383177, 11.34178655, 12.12477078, 12.48698059,
 12.89325064, 12.07470295, 12.6777319,  10.91689448, 10.7676326,  10.66710434]
 

I know cross correlation is the right term to use, but after a while I still don't get the idea of using scipy.signal.correlate and numpy.correlate, because all I got is an array full of NaNs. So clearly I need some more knowledge in this area.

What I expect to achieve is probably a plot like those in the answer section of this thread How to make a correlation plot with a certain lag of two time series where I can see at how many hours the time lag is most likely.

Thank you a lot in advance!


Solution

  • With the given data, you can use the numpy and matplotlib modules to achieve the desired result.

    so, you can do something like this:

    import numpy as np
    from matplotlib import pyplot as plt
    
    
    x = np.array(inside_humidity)
    y = np.array(outside_humidity)
    
    fig = plt.figure()
    
    # fit a curve of your choice
    a, b = np.polyfit(inside_humidity, outside_humidity, 1)
    y_fit = a * x + b
    
    # scatter plot, and fitted plot (best fit used)
    plt.scatter(inside_humidity, outside_humidity)
    plt.plot(x, y_fit)
    
    plt.show()
    

    which gives this:

    enter image description here