Search code examples
entityoperations-researchknowledge-graphentity-linking

How is hits@k calculated and what does it mean in the context of link prediction in knowledge bases


I study papers on link prediction in knowledge networks. Authors usually report "Hits@k". I wonder how to calculate hits@k and what does it mean about the model and the results?


Solution

  • In a nutshell, it is the count of how many positive triples are ranked in the top-n positions against a bunch of synthetic negatives.

    In the following example, pretend the test set includes two ground truth positive only:

    Jack   born_in   Italy
    Jack   friend_with   Thomas
    

    Let's assume such positive triples (identified by * below) are ranked against four synthetic negatives each.

    Now, assign a score to each of the positives and its synthetic negatives using your pre-trained embedding model. Then, sort the triples in descending order. In the example below, the first triple ranks 2nd, and the other triple ranks first (against their respective synthetic negatives):

    s        p         o            score   rank
    Jack   born_in   Ireland        0.789      1
    Jack   born_in   Italy          0.753      2  *
    Jack   born_in   Germany        0.695      3
    Jack   born_in   China          0.456      4
    Jack   born_in   Thomas         0.234      5
    
    s        p         o            score   rank
    Jack   friend_with   Thomas     0.901      1  *
    Jack   friend_with   China      0.345      2
    Jack   friend_with   Italy      0.293      3
    Jack   friend_with   Ireland    0.201      4
    Jack   friend_with   Germany    0.156      5
    

    Then, count how many positives occur in the top-1 or top-3 positions, and divide by the number of triples in the test set (which in this example includes 2 triples):

    Hits@3= 2/2 = 1.0
    Hits@1= 1/2 = 0.5
    

    AmpliGraph has an API to compute Hits@n - check out the documentation here.