Search code examples
rdata-miningapriori

Lift factor value


I have a transaction matrix like this:

      "u1" "u10" "u2" "u3"  ...
_____________________________________
"A", |  1    0    1    1 ...
"B", |  0    1    0    0   
"u10"|  0    0    0    0    .
"u11"|  0    0    0    0    .
"u2" |  0    0    0    0    .
"u4" |  0    0    0    0   
  .                      .
  .                        .
  .                          .

And I am trying to determining the lift of the each pair (i, j), eg., lift(u1, A), in that matrix using R first I tried to use the apriori algorithm of the arules package, but I am not interested in rules. Then, I came to this implementation, but this only works for symmetric matrix. I would like some idea of how can do that or if there is some implementation of this in any R package that do this.

Many thanks!


Solution

  • I guess what you want is something like the following:

    My first assumption is, the matrix you start with is not a user-item transaction matrix, rather an item-item co-occurrence matrix, where the entry i,j represents # transactions where the item i was bought given item j was brought. Here is a small co-occurrence matrix C. In this matrix each row i represents the # transactions where the i th item was bought.

    C
        I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 I13 I14 I15 I16 I17 I18 I19 I20
    I1   2  0  0  0  0  0  0  1  0   1   0   0   0   0   0   1   0   0   0   0
    I2   0  3  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
    I3   0  0  9  0  1  0  0  1  1   0   1   0   1   0   0   0   0   0   0   1
    I4   0  0  0  5  0  0  0  1  1   0   0   0   0   0   1   0   0   0   0   0
    I5   0  0  1  0  4  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0
    I6   0  0  0  0  0  3  0  0  0   1   0   0   0   0   0   0   0   1   1   0
    I7   0  0  0  0  0  0  2  0  1   0   0   0   0   0   0   0   0   0   1   0
    I8   1  0  1  1  0  0  0  6  2   0   0   0   0   0   0   2   0   1   0   1
    I9   0  0  1  1  0  0  1  2  9   0   1   1   0   0   0   1   1   2   1   0
    I10  1  0  0  0  0  1  0  0  0   4   0   0   0   0   0   0   0   0   1   0
    I11  0  0  1  0  0  0  0  0  1   0   2   0   0   0   0   0   1   0   0   0
    I12  0  0  0  0  0  0  0  0  1   0   0   6   0   0   0   1   1   1   1   1
    I13  0  0  1  0  0  0  0  0  0   0   0   0   2   0   0   0   0   0   0   0
    I14  0  0  0  0  0  0  0  0  0   0   0   0   0   6   0   1   0   0   0   0
    I15  0  0  0  1  0  0  0  0  0   0   0   0   0   0   5   0   0   0   1   0
    I16  1  0  0  0  0  0  0  2  1   0   0   1   0   1   0   4   1   1   1   0
    I17  0  0  0  0  0  0  0  0  1   0   1   1   0   0   0   1   5   0   1   0
    I18  0  0  0  0  0  1  0  1  2   0   0   1   0   0   0   1   0   9   0   0
    I19  0  0  0  0  0  1  1  0  1   1   0   1   0   0   1   1   1   0   7   0
    I20  0  0  1  0  0  0  0  1  0   0   0   1   0   0   0   0   0   0   0   6
    

    Now lift of an item A given an item B is P(A|B) / P(A).

    items.probs <- rowSums(C) / sum(C)
    cond.probs <- C / rowSums(C)
    lifts <- round(cond.probs / items.probs,2)
    
    lifts
          I1   I2   I3   I4   I5   I6   I7   I8   I9  I10  I11  I12  I13  I14  I15  I16  I17  I18  I19  I20
    I1  0.90 0.90 0.45 1.79 1.79 1.35 1.79 0.90 0.00 1.79 1.79 1.35 0.45 0.45 0.45 0.00 0.90 0.45 1.35 0.45
    I2  2.42 1.82 1.21 1.82 1.21 0.61 0.00 1.21 2.42 0.00 1.21 0.00 0.61 1.82 1.21 1.21 0.61 1.82 0.00 1.21
    I3  0.66 0.99 0.66 0.99 0.99 0.99 1.33 0.00 0.99 0.99 0.33 0.99 0.33 0.99 0.66 1.33 1.33 0.66 0.99 0.33
    I4  2.18 0.55 0.55 0.55 0.55 0.00 2.18 0.00 0.55 1.64 2.18 2.18 0.55 1.09 0.55 0.55 1.09 0.55 2.18 1.64
    I5  0.00 0.00 0.00 1.35 2.71 2.03 2.03 0.68 0.00 2.03 2.03 0.68 2.03 0.68 2.71 0.00 2.03 0.00 2.03 0.68
    I6  0.68 2.71 0.00 0.00 0.00 2.03 2.71 0.68 0.00 2.71 2.71 1.35 1.35 1.35 0.00 1.35 0.00 2.03 0.68 1.35
    I7  0.69 1.38 0.69 0.69 1.04 1.38 1.38 0.35 0.00 0.35 0.69 1.38 1.38 1.04 0.35 0.69 0.35 1.38 0.35 1.38
    I8  1.04 1.38 0.00 0.00 1.38 1.04 0.35 1.38 1.38 0.35 1.04 1.04 1.04 0.69 1.04 0.35 1.04 0.69 1.04 0.69
    I9  0.00 1.97 1.97 0.00 2.96 1.97 0.99 1.97 0.99 1.97 0.00 0.00 0.99 0.99 0.99 1.97 1.97 3.94 2.96 0.00
    I10 2.30 1.15 1.15 0.00 1.72 2.30 1.15 0.57 0.00 1.15 1.15 0.00 0.00 1.15 1.15 1.15 2.30 1.15 2.30 0.00
    I11 0.59 1.18 0.00 0.30 0.89 0.00 1.18 1.18 0.89 1.18 0.30 1.18 0.00 1.18 0.89 0.89 1.18 0.30 1.18 1.18
    I12 0.75 1.13 0.38 0.38 0.00 0.38 1.13 1.13 0.00 1.13 0.38 1.13 1.13 1.50 1.50 1.50 0.38 1.50 0.75 1.50
    I13 1.50 1.50 0.38 1.13 1.13 1.50 1.50 0.75 0.00 0.38 1.13 0.75 0.00 1.13 1.50 0.75 1.50 0.00 0.38 0.75
    I14 1.88 0.94 0.47 0.94 0.00 1.88 0.00 0.47 0.47 0.47 1.41 0.94 0.94 1.88 0.94 0.94 0.47 1.88 1.88 0.94
    I15 0.00 1.04 1.38 1.38 1.38 1.04 1.38 0.35 0.00 1.38 1.04 1.04 1.04 1.38 0.69 1.04 0.00 0.35 0.35 0.69
    I16 1.08 1.08 0.72 0.72 1.44 1.08 1.44 1.08 0.00 0.00 0.72 0.00 1.08 1.44 0.72 1.44 0.72 1.44 1.08 0.00
    I17 1.97 1.97 2.96 0.00 0.00 0.99 0.00 1.97 0.00 2.96 0.00 0.00 0.99 0.99 3.94 1.97 1.97 0.99 0.99 3.94
    I18 0.00 0.00 0.00 2.42 1.82 0.00 0.00 2.42 1.21 1.21 2.42 2.42 0.00 2.42 1.82 1.82 1.82 0.61 0.00 0.00
    I19 1.41 1.41 0.94 1.88 0.47 0.47 1.88 1.88 0.00 0.47 0.00 0.00 1.41 0.00 0.47 0.94 0.94 1.41 1.88 1.88
    I20 0.00 0.00 3.45 0.86 0.00 2.59 0.86 0.86 0.86 1.73 0.86 2.59 1.73 1.73 3.45 0.00 1.73 0.00 2.59 0.86