Search code examples
pythonrpy2

Translate function from R to rpy2


​Hi! I am using a package in R called stylo for stylometric purposes (basically Machine Learning for identification of literary authors based on lexical frequencies), but I am using it in Python using rpy2.

In R I would do following:

library(stylo)
cosine.delta = function(x){
        # z-scoring the input matrix of frequencies
        x = scale(x)
        # computing cosine dissimilarity
        y = as.dist( x %*% t(x) / (sqrt(rowSums(x^2) %*% t(rowSums(x^2)))) )
        # then, turning it into cosine similarity
        z = 1 - y
        # getting the results
        return(z)
    }
stylo(distance.measure="cosine.delta")

Now in Python I know how to call the library and the function stylo, but I don't know how to define the function cosine.delta. Any idea? I have tried things like this:

import rpy2.robjects as ro
R = ro.r
R.library("stylo")
cosinedelta = R.function(x){
        # z-scoring the input matrix of frequencies
        x = scale(x)
        # computing cosine dissimilarity
        y = as.dist( x %*% t(x) / (sqrt(rowSums(x^2) %*% t(rowSums(x^2)))) )
        # then, turning it into cosine similarity
        z = 1 - y
        # getting the results
        return(z)
}
R.stylo(distance.measure="cosinedelta")

It says that the { is invalid syntax. I have been trying different things (other types of brackets, using

from rpy2.robjects.packages import importr 
base = importr('base')) 

but nothing works I don't know much about neither R or rpy2 syntax...


Solution

  • You can run any R code through rpy2 simply by putting it in a big string, and passing that string as an argument into R(). For you, the following should work:

    import rpy2.robjects as ro
    R = ro.r
    R.library("stylo")
    R('''
        cosinedelta <- function(x){
            # z-scoring the input matrix of frequencies
            x = scale(x)
            # computing cosine dissimilarity
            y = as.dist( x %*% t(x) / (sqrt(rowSums(x^2) %*% t(rowSums(x^2)))) )
            # then, turning it into cosine similarity
            z = 1 - y
            # getting the results
            return(z)
        }
        ''')
    R('stylo(distance.measure=\"cosinedelta\")')
    

    This is basically just the R code (with cosinedelta instead of cosine.delta, not sure whether that matters), wrapped in ''' ''' to make it a multi-line string in python, and R( ) wrapped around that to execute it as R code.

    The last line of code works in a similar way. To be safe, I put backslashes in front of the quotation markers that should be passed straight into R, to make sure that python doesn't try doing anything funny with them and just passes them into R as it should.

    For this answer I basically adapted an example in the documentation, it's probably useful to look through that yourself too.