I would like to know that whether there is a way in R for using datasets with similar names. For example:
I have eight datasets named cars_1995
, cars_new_1995
, cars_1996
, cars_new_1996
, cars_1997
, cars_new_1997
, cars_1998
and cars_new_1998
and they have different dimensions but only thing that I want to use is their row numbers.
So I want to create a new data frame and rownames will be the years (1995,1996,1997,1998) and colnames will be the groups (cars
, cars_new
). the dimension of the data frame will be 4:2. The values of the dataframe will be the rownumbers such as:
Cars Cars_new
1995 25 76
1996 38 35
1997 87 49
1998 34 50
First question is, in the R, is there a way to use the names of dataframes like I need to take row numbers of all data frames from 1995 to 1999 (one column for cars_YEAR
, the other column for cars_YEAR_new
). By the way only year indicator is the names of the data frames.
I'm OK with how to build a data frame or how to get the rownumbers (nrow
) However, I haven't figured out how to pull all datasets by just using embed years of the dataframes' names consecutively.So it would be great how to get all cars_YEAR
datasets row numbers by using the year extension in the datasets names.
stringr
+ dplyr
+ tidyr
:library(stringr)
library(dplyr)
library(tidyr)
ls() %>%
str_extract("^cars_\\d{4}.*") %>%
na.omit() %>%
mget(envir = globalenv()) %>%
lapply(nrow) %>%
data.frame() %>%
gather(key, value) %>%
mutate(key = str_replace(key, "(\\d{4})_(.+)", "\\2_\\1")) %>%
separate(key, c("key", "year"), "_(?=\\d)") %>%
spread(key, value)
Result:
year cars cars_new
1 1995 114 102
2 1996 113 61
3 1997 168 186
4 1998 196 68
Notes:
ls()
is handy for this example as it can extract all object names in the global environment.str_extract
, I extracted only object names that starts with cars_[4 digits]
then fed the result into mget
mget
takes a vector of character object names and returns the values from a specified environment in a named listlapply
is used to extract row numbers in each element (dataset) of the list. This is then converted to a data.frame with each column corresponding to a row number.gather
reshapes the data.frame from wide to long format.mutate
and separate
are respectively to change names from cars_1995_news
to cars_new_1995
and separate into a column of cars_new
and year
spread
reshapes the data.frame to the desired format.Note about data:
I used rnorm
and sample
to create the toy datasets and did not set a seed for each of them. This means you will not get the same resulting data.frame as what I have here. This is ok as what the row numbers actually are is not important to solve OP's problem.
Data:
# Create sample datasets
cars_1995 = data.frame(A = rnorm(sample(50:200, 1)))
cars_1995_new = data.frame(A = rnorm(sample(50:200, 1)))
cars_1996 = data.frame(A = rnorm(sample(50:200, 1)))
cars_1996_new = data.frame(A = rnorm(sample(50:200, 1)))
cars_1997 = data.frame(A = rnorm(sample(50:200, 1)))
cars_1997_new = data.frame(A = rnorm(sample(50:200, 1)))
cars_1998 = data.frame(A = rnorm(sample(50:200, 1)))
cars_1998_new = data.frame(A = rnorm(sample(50:200, 1)))
cars_YEAR_new
to cars_new_YEAR
):ls() %>%
str_extract("^cars_(new_)*\\d{4}") %>%
na.omit() %>%
mget(envir = globalenv()) %>%
lapply(nrow) %>%
data.frame() %>%
gather(key, value) %>%
separate(key, c("key", "year"), "_(?=\\d)") %>%
spread(key, value)
With the new dataset names, the mutate
+ str_replace
step can be removed, and the str_extract
step also has to be changed, since the pattern is now different.
New Data:
# Create sample datasets
cars_1995 = data.frame(A = rnorm(sample(50:200, 1)))
cars_new_1995 = data.frame(A = rnorm(sample(50:200, 1)))
cars_1996 = data.frame(A = rnorm(sample(50:200, 1)))
cars_new_1996 = data.frame(A = rnorm(sample(50:200, 1)))
cars_1997 = data.frame(A = rnorm(sample(50:200, 1)))
cars_new_1997 = data.frame(A = rnorm(sample(50:200, 1)))
cars_1998 = data.frame(A = rnorm(sample(50:200, 1)))
cars_new_1998 = data.frame(A = rnorm(sample(50:200, 1)))