I have an R data frame
which columns are logical variables.
I need to make some kind of dot product between all possible pairs of columns.
This arise from text corpus analysis, where the data frame indicates which terms (rows) are present in which documents (columns). There are common, fast solutions for the case where one wishes to compute distances with each possible possible pairs of columns, using daisy
from the cluster
package or cosine
from the lsa
package.
I would however need to use some kind of dot product between all pairs of columns instead : the goal is to count how many words are simultaneously present in both documents been compared (and this, for each pair).
Let's use this example:
df <- data.frame(x1 = c(T, T, F), x2 = c(F, F, F), x3 = c(T, F, T))
I would turn the data.frame into a matrix then compute the crossproduct:
crossprod(data.matrix(df))
# x1 x2 x3
# x1 2 0 1
# x2 0 0 0
# x3 1 0 2