Search code examples
rstatastata-macros

Bring R list into Stata as macro?


I wish to run a Lasso model in R from Stata and then bring a resulting character list (the names of the subset coefficients) back into Stata as a macro (for example, a global).

At the moment I am aware of two options:

  1. I save a dta file and run an R script from Stata using shell:

    shell $Rloc --vanilla <"${LOC}/Lasso.R"
    

    This works from the saved dta file and allows me to run the Lasso model that I wish to run, but is not interactive, so I can't bring the relevant character list (with the names of subset variables) back into Stata.

  2. I run R interactively from Stata using rcall. However, rcall won't allow me to load a large enough matrix, even under max Stata memory. My predictive matrix Z (to be subset by Lasso) is 1,000 by 100 but when I run the command:

    rcall: X <- st.matrix(Z) 
    

    I receive an error stating:

    macro substitution results in line that is too long: The line resulting from substituting macros would be longer than allowed. The maximum allowed length is 645,216 characters, which is calculated on the basis of set maxvar.

Is there some way to interactively run R from Stata, which allows large matrices, such that I may bring a character list from R back into Stata as a macro?

Thanks in advance.


Solution

  • Below i will try to consolidate the comments in a -hopefully- useful answer.

    Unfortunately, rcall does not appear to play nicely with large matrices like the one you need. I think it would be best to call R to run your script using the shell command and save the string(s) as variables in a dta file. This requires a bit more work but it is certainly programmable.

    Then you could read these variables into Stata and manipulate them easily using built-in functions. For example, you could save the strings in separate variables or in one and use levelsof as @Dimitriy recommended.

    Consider the following toy example:

    clear
    set obs 5
    
    input str50 string
    "this is a string"
    "A longer string is this"
    "A string that is even longer is this one"
    "How many strings do you have?"
    end
    
    levelsof string, local(newstr) 
    `"A longer string is this"' `"A string that is even longer is this one"' `"How many strings do you have?"' `"this is a string"'
    
    tokenize `"`newstr'"'
    
    forvalues i = 1 / `: word count `newstr'' {
        display "``i''"
    }
    
    A longer string is this
    A string that is even longer is this one
    How many strings do you have?
    this is a string
    

    From my experience, programs like rcall and rsource are useful for simple tasks. However, they can become a real hassle for more complicated work in which case i personally just resort to the real thing, that is using the other software directly.

    As @Dimitriy also indicated, there are now some community-contributed commands available for lasso, ehich may cover your need so you do not have to fiddle with R:

    search lasso
    
    5 packages found (Stata Journal and STB listed first)
    -----------------------------------------------------
    
    elasticregress from http://fmwww.bc.edu/RePEc/bocode/e
        'ELASTICREGRESS': module to perform elastic net regression, lasso
        regression, ridge regression / elasticregress calculates an elastic
        net-regularized / regression: an estimator of a linear model in which
        larger / parameters are discouraged.  This estimator nests the LASSO / and
    
    lars from http://fmwww.bc.edu/RePEc/bocode/l
        'LARS': module to perform least angle regression / Least Angle Regression
        is a model-building algorithm that / considers parsimony as well as
        prediction accuracy.  This / method is covered in detail by the paper
        Efron, Hastie, Johnstone / and Tibshirani (2004), published in The Annals
    
    lassopack from http://fmwww.bc.edu/RePEc/bocode/l
        'LASSOPACK': module for lasso, square-root lasso, elastic net, ridge,
        adaptive lasso estimation and cross-validation / lassopack is a suite of
        programs for penalized regression / methods suitable for the
        high-dimensional setting where the / number of predictors p may be large
    
    pdslasso from http://fmwww.bc.edu/RePEc/bocode/p
        'PDSLASSO': module for post-selection and post-regularization OLS or IV
        estimation and inference / pdslasso and ivlasso are routines for
        estimating structural / parameters in linear models with many controls
        and/or / instruments. The routines use methods for estimating sparse /
    
    sivreg from http://fmwww.bc.edu/RePEc/bocode/s
        'SIVREG': module to perform adaptive Lasso with some invalid instruments /
        sivreg estimates a linear instrumental variables regression / where some
        of the instruments fail the exclusion restriction / and are thus invalid.
        The LARS algorithm (Efron et al., 2004) is / applied as long as the Hansen