Search code examples
rtidyversepurrrsublist

How to apply a function to sublists in R


I am trying to apply the sample_n() function to sublists in R. Somehow I could not get it right after several trials.

My data structure is a list of 27 lists (I will call them 27 elements). Each element is a list of data frames. Please see these two pictures for a clearer view of the data structure.

A list of 27 lists.

enter image description here

Each sublist is a list of data frames.

enter image description here

I want to apply sample_n() to each data frame. An example of the data frames looks like this.

> test2[[1]][[1]]

An example of the data frames

enter image description here

For this data frame, the 1st element of the outdegree_within_or1 variable is 1. So I want to sample 1 row in this data frame. If for another data frame the corresponding value is 5, then I want to sample 5 rows of that data frame. Notice that the actor_id value is the same for all entries in the data frame. The output should have the same structure as the input data, but with less rows in each data frame. The actor_id value within each data frame of the output should be the same. I used the following R code to do it, but it does not give me the correct results.

ties_within <-map(test2, ~map_df(.x, ~sample_n(.x, outdegree_within_or1[1]))) 

The above code returned a list of 27 lists (I will call them 27 elements). At this time, each element is a single data frame, instead of a list of data frames (what I am supposed to get). It seems that the above code is sampling rows across all data frames within each element of the original input data, instead of sampling rows within each data frame (what I intended to do).

What would be a solution?


Solution

  • use map instead of map_df.

    ties_within <-map(test2, ~map(., ~sample_n(., size = id[1]))) 
    

    map returns a list of the same size as the input, whereas map_df converts the returned object to dataframe.