I have a dataframe like this:
df <- data.frame("COL_1" = "a", "COL_2" = c("a","b","c"), "COL_3" = "b", "COL_4" = "c", "COL_6" = "d")
Output:
DB_INDEX|COL_2|COL_3|COL_4|COL_6
1 a a b c d
2 a b b c d
3 a c b c d
I would like to create a new dataframe which contains only one row, like this:
Expected_output:
DB_INDEX|COL_2|COL_3|COL_4|COL_6
1 a [a,b,c] b c d
So, basically I would like to "group by" columns db_index,3,4,6, and create a list of values where they are different in col_2. DB_INDEX, COL_3, COL_4, COL_5 will always have the same value, only COL_2 is changing.
Any ideas that can help me?
edited
I found this:
df <- df %>% group_by(COL_1) %>%
mutate(COL_2 = paste0(COL_2, collapse = " "))
But it returns a string, so I tried to modify it as this:
df <- df %>% group_by(COL_1) %>%
mutate(COL_2 = as.list(COL_2)
But it returns this error:
replacement has 2 rows, data has 1
COL_2 row must contain a list, I know in Python Pandas is possible, is it in R?
It's my first week in R so I am finding some trouble. Thanks in advance :)
You can try:
df %>%
group_by(DB_INDEX) %>%
summarise_all(~ list(unique(.)))
DB_INDEX COL_2 COL_3 COL_4 COL_6
<chr> <list> <list> <list> <list>
1 a <chr [3]> <chr [1]> <chr [1]> <chr [1]>