Search code examples
rlistdataframequanteda

How to turn a list into a "textmodel_wordscores" or "textmodel"?


I ran wordscore. The output is an object whose format is "textmodel_wordscores"\"textmodel"\"list"(by applying class on it). I then ran predict on this object and I got results.

Here the code just for reference:

train_ref <- textmodel_wordscores(dfm, y = docvars(df1, "Ref_score"), smooth=0.1)
word_score <- predict(train_ref, se.fit = TRUE, newdata = dfm2, rescaling = "mv")

class(train_ref) #"textmodel_wordscores" "textmodel"            "list"   
class(train_ref$wordscores) #numeric

What I tried to do is basically replacing train_ref$wordscores with a numeric object that has the same structure of the object replaced. See below:

missing_train <- train_ref[-c(1)] #removing train_ref$wordscores

train_ref2 <- c(missing_train, coef_train_list)

#note that class(train_ref2) is now just a list object

# train_ref2 is just a list whereas train_ref is a texmodel object. The former doesn't go throught the function *predict*, while the latter does so

The problem is that when I then try to use train_ref2 to predict, I get the following error: Error in UseMethod("predict") : no applicable method for 'predict' applied to an object of class "list".

My question is then: is there a way to convert a list into a textmodel object?

I didn't put in the data to run the model as it is quite convoluted to run wordscore here. I will edit the question if you need more info.

Thanks a lot!


Solution

  • Better to manipulate the wordscores element of the list directly, rather than try to replace it. There is an accessor method for coef() but this does not allow replacement. So you can do it this way:

    library("quanteda")
    ## Package version: 1.5.2
    
    tmod <- textmodel_wordscores(data_dfm_lbgexample, y = c(seq(-1.5, 1.5, .75), NA))
    head(coef(tmod), 10)
    ##         A         B         C         D         E         F         G         H 
    ## -1.500000 -1.500000 -1.500000 -1.500000 -1.500000 -1.481250 -1.480932 -1.451923 
    ##         I         J 
    ## -1.408333 -1.323298
    predict(tmod)
    ##            R1            R2            R3            R4            R5 
    ## -1.317931e+00 -7.395598e-01 -8.673617e-18  7.395598e-01  1.317931e+00 
    ##            V1 
    ## -4.480591e-01
    
    # replace some wordscores with 10
    tmod$wordscores[c("F", "G")] <- 10
    head(coef(tmod), 10)
    ##         A         B         C         D         E         F         G         H 
    ## -1.500000 -1.500000 -1.500000 -1.500000 -1.500000 10.000000 10.000000 -1.451923 
    ##         I         J 
    ## -1.408333 -1.323298
    predict(tmod)
    ##            R1            R2            R3            R4            R5 
    ##  8.979134e-01 -6.821545e-01 -8.673617e-18  7.395598e-01  1.317931e+00 
    ##            V1 
    ## -4.480591e-01
    
    # remove F and G some wordscores
    tmod$wordscores <- tmod$wordscores[-match(c("F", "G"), names(coef(tmod)))]
    head(coef(tmod), 10)
    ##         A         B         C         D         E         H         I         J 
    ## -1.500000 -1.500000 -1.500000 -1.500000 -1.500000 -1.451923 -1.408333 -1.323298 
    ##         K         L 
    ## -1.184615 -1.036990
    predict(tmod)
    ## Warning: 2 features in newdata not used in prediction.
    ##            R1            R2            R3            R4            R5 
    ## -1.278918e+00 -7.358337e-01 -8.673617e-18  7.395598e-01  1.317931e+00 
    ##            V1 
    ## -4.480591e-01
    

    Here, I used indexing by the names of features to make this more stable than numerical indices, but of course you could do this using integer indices too.