Search code examples
pythonstatisticscorrelationstatsmodels

Calculating and visualizing correlation between 2 variables which are in an unordered series


As a part of my final year research implementation, I'm trying to calculate and visualize the correlation between two variables which are not in a ordered series. In a dataset such as follows,

DateAndTime           Demand    Temperature
2015-01-02 18:00:00    2081         41
2015-01-02 19:00:00    2370         42
2015-01-02 20:00:00    2048         42
2015-01-02 21:00:00    1806         42
2015-01-02 22:00:00    1818         41
2015-01-02 23:00:00    1918         40
2015-01-03 00:00:00    1685         40
2015-01-03 01:00:00    1263         38
2015-01-03 02:00:00     969         38
2015-01-03 03:00:00     763         37
2015-01-03 04:00:00     622         36

Calculating and visualizing the correlation between the Date and Demand is straightforward since they are in an ordered series and a scatterplot can be used to easily visualize their correlation. However, if I were to calculate the correlation between the Temperature and Demand the resulting scatterplot does not make much sense as it's not in any mathematical order. What approach should be used to visualize the correlation between these 2 variables in a more meaningful manner. I'm using basic python frameworks such as Matplotlib, Statsmodels and Sklearn for this.


Solution

  • Okay so the idea is to plot both columns, one in the x-axis and the other in the y-axis, and try to make a line that simulates its behaviour. Numpy has a function to compute the line so

    import numpy as np
    import matplotlib.pyplot as plt
    
    x = [4,2,1,5]
    y = [2,4,6,3]
    
    fit = np.polyfit(x,y,1)
    fit_line = np.poly1d(fit)
    
    plt.figure()
    plt.plot(x,y,'rx')
    plt.plot(x,fit_line(x),'--b')
    plt.show()
    

    enter image description here

    And if we consider the regression line to be y = a*x + b, you can obtain the coefficient a and b so that

    a = fit[0]
    b = fit[1]
    

    which returns

    a = -0.8000000000000005
    b = 6.150000000000002
    

    Just use your x and y