I have two curves A
and B
like that are highly correlated as shown in the figure below where C
is the Pearson correlation between A
and B
.
The file containing the data can be downloaded here.
import numpy as np
import pandas as pd
import pylab as plt
df = pd.read_csv('prova.csv')
A = df['A'].values
B = df['B'].values
from scipy.stats.stats import pearsonr
C = pearsonr(A,B)[0]
fig, ax = plt.subplots(1,2, figsize=(20, 5))
ax1 = ax[0]
ax2 = ax1.twinx()
ax1.plot(A, 'g-')
ax2.plot(B, 'b-')
ax1.set_ylabel('A', color='g', fontsize=20);
ax2.set_ylabel('B', color='b', fontsize=20);
ax2 = ax[1]
txt = 'C = %.2f'%C
ax2.scatter(A, B, label=txt)
ax2.set_xlabel('A', color='g', fontsize=20);
ax2.set_ylabel('B', color='b', fontsize=20);
ax2.legend(fontsize = 16)
The values of the green curve should be 0
but the signal is affected by B
. I would like to find the relation between A
and B
in order to be for A
and B
to cancel out, but I am unsure how to proceed.
Clearly, A
and B
predict each other quite well. We can exploit this to ensure we obtain a value at about 0
given values of A
and B
. My method of choice is the least_squares
fit.
We want to minimize A - x * B - c
for some parameters x
and c
. This can be done using,
import matplotlib.pyplot as plt
import pandas as pd
import scipy.optimize as opt
df = pd.read_csv('prova.csv')
def fit(x):
return df['A'] - x[0] * df['B'] - x[1]
result = opt.least_squares(fit, [0, 0])
fit(result.x).plot()
plt.show()
This results in,
Which is many orders of magnitude closer to zero.