Search code examples
pythonrscikit-learnsimilaritycosine-similarity

Python cosine_similarity doesn't work for matrix with NaNs


Need to find python function that works like this R func:

proxy::simil(method = "cosine", by_rows = FALSE) 

i.e. finds similarity matrix by pair-wise calculating cosine distance between dataframe rows. If NaNs are present, it should drop exact columns with NaNs in these 2 rows

Simil function description (R)

Python error because of NaNs

upd. I have also tried to delete NaNs in every pair of rows in loop using cosine func from scipy.spatial.distance. It gives the same result as in R, but works ages :(


Solution

  • You can try this approach: https://github.com/Midnighter/nadist, alternatively you can use _chk_weights with nan_screen=True as described here by metaperture here https://github.com/scipy/scipy/issues/3870, hope that helps.

    I have found that Midnighter had posted the same problem previously on stackoverflow: Compute the pairwise distance in scipy with missing values. There are some other solutions there but, as he moved on to cytonize it I bet they were not the best.