As a part of my final year research implementation, I'm trying to calculate and visualize the correlation between two variables which are not in a ordered series. In a dataset such as follows,
DateAndTime Demand Temperature
2015-01-02 18:00:00 2081 41
2015-01-02 19:00:00 2370 42
2015-01-02 20:00:00 2048 42
2015-01-02 21:00:00 1806 42
2015-01-02 22:00:00 1818 41
2015-01-02 23:00:00 1918 40
2015-01-03 00:00:00 1685 40
2015-01-03 01:00:00 1263 38
2015-01-03 02:00:00 969 38
2015-01-03 03:00:00 763 37
2015-01-03 04:00:00 622 36
Calculating and visualizing the correlation between the Date and Demand is straightforward since they are in an ordered series and a scatterplot can be used to easily visualize their correlation. However, if I were to calculate the correlation between the Temperature and Demand the resulting scatterplot does not make much sense as it's not in any mathematical order. What approach should be used to visualize the correlation between these 2 variables in a more meaningful manner. I'm using basic python frameworks such as Matplotlib, Statsmodels and Sklearn for this.
Okay so the idea is to plot both columns, one in the x-axis and the other in the y-axis, and try to make a line that simulates its behaviour. Numpy has a function to compute the line so
import numpy as np
import matplotlib.pyplot as plt
x = [4,2,1,5]
y = [2,4,6,3]
fit = np.polyfit(x,y,1)
fit_line = np.poly1d(fit)
plt.figure()
plt.plot(x,y,'rx')
plt.plot(x,fit_line(x),'--b')
plt.show()
And if we consider the regression line to be y = a*x + b
, you can obtain the coefficient a and b so that
a = fit[0]
b = fit[1]
which returns
a = -0.8000000000000005
b = 6.150000000000002
Just use your x and y