Hi! I am using a package in R called stylo
for stylometric purposes (basically Machine Learning for identification of literary authors based on lexical frequencies), but I am using it in Python using rpy2
.
In R I would do following:
library(stylo)
cosine.delta = function(x){
# z-scoring the input matrix of frequencies
x = scale(x)
# computing cosine dissimilarity
y = as.dist( x %*% t(x) / (sqrt(rowSums(x^2) %*% t(rowSums(x^2)))) )
# then, turning it into cosine similarity
z = 1 - y
# getting the results
return(z)
}
stylo(distance.measure="cosine.delta")
Now in Python I know how to call the library
and the function stylo
, but I don't know how to define the function cosine.delta
. Any idea? I have tried things like this:
import rpy2.robjects as ro
R = ro.r
R.library("stylo")
cosinedelta = R.function(x){
# z-scoring the input matrix of frequencies
x = scale(x)
# computing cosine dissimilarity
y = as.dist( x %*% t(x) / (sqrt(rowSums(x^2) %*% t(rowSums(x^2)))) )
# then, turning it into cosine similarity
z = 1 - y
# getting the results
return(z)
}
R.stylo(distance.measure="cosinedelta")
It says that the {
is invalid syntax. I have been trying different things (other types of brackets, using
from rpy2.robjects.packages import importr
base = importr('base'))
but nothing works I don't know much about neither R or rpy2
syntax...
You can run any R code through rpy2
simply by putting it in a big string, and passing that string as an argument into R()
. For you, the following should work:
import rpy2.robjects as ro
R = ro.r
R.library("stylo")
R('''
cosinedelta <- function(x){
# z-scoring the input matrix of frequencies
x = scale(x)
# computing cosine dissimilarity
y = as.dist( x %*% t(x) / (sqrt(rowSums(x^2) %*% t(rowSums(x^2)))) )
# then, turning it into cosine similarity
z = 1 - y
# getting the results
return(z)
}
''')
R('stylo(distance.measure=\"cosinedelta\")')
This is basically just the R code (with cosinedelta
instead of cosine.delta
, not sure whether that matters), wrapped in ''' '''
to make it a multi-line string in python, and R( )
wrapped around that to execute it as R code.
The last line of code works in a similar way. To be safe, I put backslashes in front of the quotation markers that should be passed straight into R, to make sure that python doesn't try doing anything funny with them and just passes them into R as it should.
For this answer I basically adapted an example in the documentation, it's probably useful to look through that yourself too.