I have three data frames each including a column named "center". In this column there are 6 different values based on which I need to group data and save them as different data frames. Below you can see an example of one data frame. The other two data frames vary in the number of columns and rows. But the columns center and NR (being the unique ids) are common across all of them.
structure(list(NR = c("DBD-0006", "DBD-0057",
"DBD-0095", "GHP-0169", "GHP-0237", "NNB-0243", "NNB-0303",
"NNB-0306", "NNB-0359", "NNB-0364"), DATE = c("13-07-2011",
"15-12-2010", "09-03-2011", "14-09-2011", "30-06-2010", "16-05-2016",
"04-07-2012", "11-07-2012", "05-12-2012", "12-12-2012"), CODE= c("1",
"1", "1", "1", "1", "1", "1", "1", "1", "1"), DATE2 = c("18-07-2011",
"15-12-2010", "09-03-2011", "14-09-2011", "05-01-2012", "11-05-2016",
"05-07-2012", "11-07-2012", "06-12-2012", "12-12-2012"), type = c("YY.90.01",
"50.19", "50.37", "50.37", "50.37", "YY.90.00",
"50.37", "YY.50.01", "YY.82.01", "YY.50.02"), center = c("DBD",
"DBD", "DBD", "GHP", "GHP", "NNB", "NNB", "NNB", "NNB",
"NNB")), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
I have tried to use a loop to filter the data but it does not work. Could you please help me with that? Any other solution that is neater than using a for loop is also highly appreciated.
what I did was to create a list containing the three data frames and called it cnt. cnt includes df1, df2 and df3 on the first level. Then each df has its own variables including "center" and "NR". So I think we can call it a nested list. This is how my loop looks like.
output <- list ()
for (table in names(cnt)) { #print(table) df1, df2, df3
for (name in names(cnt[[table]])) { #print(name) returns the variable names per df
center_name <- c("DBD", "GHP", "NNB")
output[[table]] <- cnt[[table]][[name]] %>% filter(center == center_name)
}
}
Here is the error I get.
Error in UseMethod("filter") :
no applicable method for 'filter' applied to an object of class "character"
It seems that I can't assign the right data to filter.
You could first ls
with a pattern
to get the names,
df_names <- ls(pattern='df\\d$')
use mget
piped into setNames
to get a named list,
cnt <- mget(df_names) |> setNames(df_names)
finally split
on $center
which gives you a list with the result.
res <- lapply(cnt, \(x) split(x, x$center))
str(res)
# List of 2
# $ df1:List of 3
# ..$ DBD:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 6 variables:
# .. ..$ NR : chr [1:3] "DBD-0006" "DBD-0057" "DBD-0095"
# .. ..$ DATE : chr [1:3] "13-07-2011" "15-12-2010" "09-03-2011"
# .. ..$ CODE : chr [1:3] "1" "1" "1"
# .. ..$ DATE2 : chr [1:3] "18-07-2011" "15-12-2010" "09-03-2011"
# .. ..$ type : chr [1:3] "YY.90.01" "50.19" "50.37"
# .. ..$ center: chr [1:3] "DBD" "DBD" "DBD"
# ..$ GHP:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 6 variables:
# .. ..$ NR : chr [1:2] "GHP-0169" "GHP-0237"
# .. ..$ DATE : chr [1:2] "14-09-2011" "30-06-2010"
# .. ..$ CODE : chr [1:2] "1" "1"
# .. ..$ DATE2 : chr [1:2] "14-09-2011" "05-01-2012"
# .. ..$ type : chr [1:2] "50.37" "50.37"
# .. ..$ center: chr [1:2] "GHP" "GHP"
# ..$ NNB:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5 obs. of 6 variables:
# .. ..$ NR : chr [1:5] "NNB-0243" "NNB-0303" "NNB-0306" "NNB-0359" ...
# .. ..$ DATE : chr [1:5] "16-05-2016" "04-07-2012" "11-07-2012" "05-12-2012" ...
# .. ..$ CODE : chr [1:5] "1" "1" "1" "1" ...
# .. ..$ DATE2 : chr [1:5] "11-05-2016" "05-07-2012" "11-07-2012" "06-12-2012" ...
# .. ..$ type : chr [1:5] "YY.90.00" "50.37" "YY.50.01" "YY.82.01" ...
# .. ..$ center: chr [1:5] "NNB" "NNB" "NNB" "NNB" ...
# $ df2:List of 3
# ..$ DBD:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 6 variables:
# .. ..$ NR : chr [1:3] "DBD-0006" "DBD-0057" "DBD-0095"
# .. ..$ DATE : chr [1:3] "13-07-2011" "15-12-2010" "09-03-2011"
# .. ..$ CODE : chr [1:3] "1" "1" "1"
# .. ..$ DATE2 : chr [1:3] "18-07-2011" "15-12-2010" "09-03-2011"
# .. ..$ type : chr [1:3] "YY.90.01" "50.19" "50.37"
# .. ..$ center: chr [1:3] "DBD" "DBD" "DBD"
# ..$ GHP:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 6 variables:
# .. ..$ NR : chr [1:2] "GHP-0169" "GHP-0237"
# .. ..$ DATE : chr [1:2] "14-09-2011" "30-06-2010"
# .. ..$ CODE : chr [1:2] "1" "1"
# .. ..$ DATE2 : chr [1:2] "14-09-2011" "05-01-2012"
# .. ..$ type : chr [1:2] "50.37" "50.37"
# .. ..$ center: chr [1:2] "GHP" "GHP"
# ..$ NNB:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5 obs. of 6 variables:
# .. ..$ NR : chr [1:5] "NNB-0243" "NNB-0303" "NNB-0306" "NNB-0359" ...
# .. ..$ DATE : chr [1:5] "16-05-2016" "04-07-2012" "11-07-2012" "05-12-2012" ...
# .. ..$ CODE : chr [1:5] "1" "1" "1" "1" ...
# .. ..$ DATE2 : chr [1:5] "11-05-2016" "05-07-2012" "11-07-2012" "06-12-2012" ...
# .. ..$ type : chr [1:5] "YY.90.00" "50.37" "YY.50.01" "YY.82.01" ...
# .. ..$ center: chr [1:5] "NNB" "NNB" "NNB" "NNB" ...
Note, that I use just two data frames to not bloat the answer too much. If you really need the data frames in your workspace you can do list2env(res, .GlobalEnv)
, but better keep it in the list.
Data:
df1 <- df2 <- structure(list(NR = c("DBD-0006", "DBD-0057", "DBD-0095", "GHP-0169",
"GHP-0237", "NNB-0243", "NNB-0303", "NNB-0306", "NNB-0359", "NNB-0364"
), DATE = c("13-07-2011", "15-12-2010", "09-03-2011", "14-09-2011",
"30-06-2010", "16-05-2016", "04-07-2012", "11-07-2012", "05-12-2012",
"12-12-2012"), CODE = c("1", "1", "1", "1", "1", "1", "1", "1",
"1", "1"), DATE2 = c("18-07-2011", "15-12-2010", "09-03-2011",
"14-09-2011", "05-01-2012", "11-05-2016", "05-07-2012", "11-07-2012",
"06-12-2012", "12-12-2012"), type = c("YY.90.01", "50.19", "50.37",
"50.37", "50.37", "YY.90.00", "50.37", "YY.50.01", "YY.82.01",
"YY.50.02"), center = c("DBD", "DBD", "DBD", "GHP", "GHP", "NNB",
"NNB", "NNB", "NNB", "NNB")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))