Search code examples
pythonnumpycosine-similarity

Create random vector given cosine similarity


Basically given some vector v, I want to get another random vector w with some cosine similarity between v and w. Is there any way we can get this in python?

Example: for simplicity I will have 2D vector of v [3,-4]. I want to get random vector w with cosine similarity of 60% or plus 0.6. This should generate vector w with values [0.875, 3] or any other vector with same cosine similarity. So I hope this is clear enough.


Solution

  • Given the vector v and cosine similarity costheta (a scalar between -1 and 1), compute w as in the function rand_cos_sim(v, costheta):

    import numpy as np
    
    
    def rand_cos_sim(v, costheta):
        # Form the unit vector parallel to v:
        u = v / np.linalg.norm(v)
    
        # Pick a random vector:
        r = np.random.multivariate_normal(np.zeros_like(v), np.eye(len(v)))
    
        # Form a vector perpendicular to v:
        uperp = r - r.dot(u)*u
    
        # Make it a unit vector:
        uperp = uperp / np.linalg.norm(uperp)
    
        # w is the linear combination of u and uperp with coefficients costheta
        # and sin(theta) = sqrt(1 - costheta**2), respectively:
        w = costheta*u + np.sqrt(1 - costheta**2)*uperp
    
        return w
    

    For example,

    In [17]: v = np.array([3, -4])
    
    In [18]: w = rand_cos_sim(v, 0.6)
    
    In [19]: w
    Out[19]: array([-0.28, -0.96])
    

    Verify the cosine similarity:

    In [20]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
    Out[20]: 0.6000000000000015
    
    In [21]: w = rand_cos_sim(v, 0.6)
    
    In [22]: w
    Out[22]: array([1., 0.])
    
    In [23]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
    Out[23]: 0.6
    

    The return value always has magnitude 1, so in the above example, there are only two possible random vectors, [1, 0] and [-0.28, -0.96].

    Another example, this one in 3-d:

    In [24]: v = np.array([3, -4, 6])
    
    In [25]: w = rand_cos_sim(v, -0.75)
    
    In [26]: w
    Out[26]: array([ 0.3194265 ,  0.46814873, -0.82389531])
    
    In [27]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
    Out[27]: -0.75
    
    In [28]: w = rand_cos_sim(v, -0.75)
    
    In [29]: w
    Out[29]: array([-0.48830063,  0.85783797, -0.16023891])
    
    In [30]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
    Out[30]: -0.75