I have a data frames in R in the global environment :
file_1 <- data.frame(A = 1:5, B = 6:10, C = 11:15)
file_2 <- data.frame(A = 1:5, D = 16:20, E = 21:25)
file_3 <- data.frame(B = 6:10, C = 11:15, F = 26:30)
I want to make a matrix that helps me understand which column names are common in all data frames and which are not.
I tried to do this manually:
for (file in files) {
data <- get(file)
column_names[[file]] <- colnames(data)
}
all_columns <- unique(unlist(column_names))
matrix <- sapply(column_names, function(cols) all_columns %in% cols)
rownames(matrix) <- all_columns
matrix_df <- as.data.frame(matrix)
print(matrix_df)
Is this the correct way to do this in R?
BTW, if they were in a list, I think we could do it like this:
all_columns <- unique(unlist(lapply(mylist, colnames)))
matrix <- sapply(mylist, function(df) all_columns %in% colnames(df))
rownames(matrix) <- all_columns
matrix_df <- as.data.frame(matrix)
print(matrix_df)
Do you mean a matrix like below?
> table(stack(lapply(mget(ls(pattern = "file_")), names)))
ind
values file_1 file_2 file_3
A 1 1 0
B 1 0 1
C 1 0 1
D 0 1 0
E 0 1 0
F 0 0 1