EDIT: Rephrasing: The situation described can be the result of data extraction from different statistical programs, that may produce separately csv files with (a) factors levels and (b) their corresponding assigned "numerical" values.
1) I may have many factor variables e.g. gender, age ranges and these factor variables have levels e.g. male/famele, 18-30/31-40 etc. 2) These levels are assigned to some numbers, either ordered or not. 3) The factor/level dataframe is one dataframe / dataset. The assigned to factor levels dataset is a separated dataframe. 4) I would like to join these two datasets into a single one. This means that the ordered factor levels must be retained and correctly assigned to their corresponding numbers.
So Factor levels and their assigned numbers are kept into two different datasets (e.g. csv files). These two dataframes must be "merged".
How can I do that? p.s. There is a cmmon ID variable between these two datasets.
d1_levels d2_levels d3_levels
1 2 2 0
2 0 1 2
3 1 2 1
4 2 2 2
d1_labels d2_labels d3_labels
1 boy east <3kg
2 dont know south 3kg
3 girl east >3kg
4 boy east 3kg
I would like to have the same result for d1_labels, as the result of the below R command
dataset$d1_labels<- factor(d1_levels, levels = c(0,1,2), labels = c("dont know", "girl", "boy"))
The question is not fully clear:
merge
is used when we are talking about two data.frames to be combined according to some key.levels
and labels
are used with factors
We will try both variants and hope the OP will specify what he is after.
merge(DF1, DF2, by = "rn")
# rn d1_levels d2_levels d3_levels d1_labels d2_labels d3_labels
#1 1 2 2 0 boy east <3kg
#2 2 0 1 2 dont know south 3kg
#3 3 1 2 1 girl east >3kg
#4 4 2 2 2 boy east 3kg
reorder(factor(DF2$d1_labels), DF1$d1_levels)
#[1] boy dont know girl boy
#attr(,"scores")
# boy dont know girl
# 2 0 1
#Levels: dont know girl boy
reorder(factor(DF2$d2_labels), DF1$d2_levels)
#[1] east south east east
#attr(,"scores")
# east south
# 2 1
#Levels: south east
reorder(factor(DF2$d3_labels), DF1$d3_levels)
#[1] <3kg 3kg >3kg 3kg
#attr(,"scores")
#<3kg >3kg 3kg
# 0 1 2
#Levels: <3kg >3kg 3kg
factor()
creates the factors, reorder()
orders the factor levels according to the sequence given in the levels columns. In R, numbering of levels start with 1.
The single results can be combined back to one dataframe (but note that this not the preferred way in R to convert many columns manually.)
result <- data.frame(
rn = DT1$rn,
d1 = reorder(factor(DF2$d1_labels), DF1$d1_levels),
d2 = reorder(factor(DF2$d2_labels), DF1$d2_levels),
d3 = reorder(factor(DF2$d3_labels), DF1$d3_levels)
)
The OP has clarified the question and has requested to combine levels and labels for as many as, e.g., 500 factor columns.
Unfortunately, this is quite complex as it requires to bring together data from two different data.frames which have been named differently. It would be much easier if the matching columns in both data.frames would have been named equally, e.g., d1
. So, we have to bring together d1_levels
from DF1
with d1_labels
from DF2
.
base_names <- na.omit(unique(stringr::str_extract(c(names(DF1), names(DF2)), ".+(?=_levels$)")))
base_names
#[1] "d1" "d2" "d3"
result <- as.data.frame(
setNames(
lapply(base_names, function(x) {
reorder(factor(DF2[[paste0(x, "_labels")]]), DF1[[paste0(x, "_levels")]])
}), base_names
)
)
result
# d1 d2 d3
#1 boy east <3kg
#2 dont know south 3kg
#3 girl east >3kg
#4 boy east 3kg
str(result)
#'data.frame': 4 obs. of 3 variables:
# $ d1: Factor w/ 3 levels "dont know","girl",..: 3 1 2 3
# ..- attr(*, "scores")= num [1:3(1d)] 2 0 1
# .. ..- attr(*, "dimnames")=List of 1
# .. .. ..$ : chr "boy" "dont know" "girl"
# $ d2: Factor w/ 2 levels "south","east": 2 1 2 2
# ..- attr(*, "scores")= num [1:2(1d)] 2 1
# .. ..- attr(*, "dimnames")=List of 1
# .. .. ..$ : chr "east" "south"
# $ d3: Factor w/ 3 levels "<3kg",">3kg",..: 1 3 2 3
# ..- attr(*, "scores")= num [1:3(1d)] 0 1 2
# .. ..- attr(*, "dimnames")=List of 1
# .. .. ..$ : chr "<3kg" ">3kg" "3kg"