I have data in this format-
[0.266465 0.9203907 1.007363 ... 0. 0.09623989 0.39632136]
It is the value of the first row and first column.
It is the value of the second column of the first row:
[0.9042176 1.135085 1.2988662 ... 0. 0.13614458 0.28000486]
I have 2200 such rows and I want to train a classifier to identify that if the two set of values are similar or not?
P.S.- These are extracted feature vector values.
If you assume relation between two extracted feature vectors to be linear, you could try using Pearson correlation:
import numpy as np
from scipy.stats import pearsonr
list1 = np.random.random(100)
list2 = np.random.random(100)
pearsonr(list1, list2)
An example output is:
(0.0746901299996632, 0.4601843257734832)
Where first value refers to correlation (7%), the second to its significance (with > 0,05 you accept the null hypothesis that the correlation is insignificant at significance level alfa = 5%). And if vectors are correlated, they are be in a way similar. More about the method here.
Also, I came across Normalized Cross-Correlation that is used for identifying similarity between pictures (not an expert, so rather check this).