Search code examples
rsubsettapply

Create several subsets at the same time


I have a dataset (insti) and I want to create 3 different subsets according to a factor (xarxa) with three levels (linkedin, instagram, twitter). I used this:

linkedin <- subset(insti, insti$xarxa=="linkedin")
twitter <- subset(insti, insti$xarxa=="twitter")
instagram <- subset(insti, insti$xarxa=="instagram")

It does work, however, I was wondering if this can be done with tapply, so I tried:

tapply(insti, insti$xarxa, subset)

It gives this error:

Error in tapply(insti, insti$xarxa, subset) :  arguments must have same length

I think that there might be some straigth forward way to do this but I can not work it out. Can you help me with this without using loops? Thanks a lot.


Solution

  • It's usually better to deal with data frames in a named list. This makes them easy to iterate over, and stops your global workspace being filled up with lots of different variables. The easiest way to get a named list is with split(insti, insti$xarxa).

    If you really want the variables written directly to your global environment rather than in a list with a single line, you can do

    list2env(split(insti, insti$xarxa), globalenv())
    

    Example

    Obviously, I don't have the insti data frame, since you did not supply any example data in your question, but we can demonstrate that the above solution works using the built-in iris data set.

    First we can see that my global environment is empty:

    ls()
    #> character(0)
    

    Now we get the iris data set, split it by species, and put the result in the global environment:

    list2env(split(datasets::iris, datasets::iris$Species), globalenv())
    #> <environment: R_GlobalEnv>
    

    So now when we check the global environment's contents, we can see that we have three data frames: one for each Species:

    ls()
    #> [1] "setosa"     "versicolor" "virginica"
    
    head(setosa)
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    #> 1          5.1         3.5          1.4         0.2  setosa
    #> 2          4.9         3.0          1.4         0.2  setosa
    #> 3          4.7         3.2          1.3         0.2  setosa
    #> 4          4.6         3.1          1.5         0.2  setosa
    #> 5          5.0         3.6          1.4         0.2  setosa
    #> 6          5.4         3.9          1.7         0.4  setosa
    

    And of course, we can also access versicolor and virginica in the same way

    Created on 2021-11-12 by the reprex package (v2.0.0)