I need to make a new data frame (col.3
) using only the occurrences in a previous column (col.1
) that correspond to unique values in another column (col.2
) in an existing data frame.
I need this:
df1
col.1 col.2
1 1
1 3
1 7
1 7
2 12
2 14
2 14
2 14
df2
col.3
1
1
1
2
2
I have tried this:
new.col <- cbind(df$col.1[unique(df$col.2)])
But it gives me a column that is both too long, and which does not include the complete set of col.1
values
I suspect that plyr
has a simple solution to this, but I have not figured that (or any other solution) out.
How can I achieve my desired result? Preferably using plyr
, but base
is fine too.
We can use duplicated
to create a logical index and use that to subset the rows
df2 <- data.frame(col3. = df$col.1[!duplicated(df$col.2)])
Or with subset
subset(df, !duplicated(col.2), select = col.1)
Or with dplyr
, usedistinct
on col.2
and then select
the 'col.1'
library(dplyr)
df %>%
distinct(col.2, .keep_all = TRUE) %>%
select(col.3 = col.1)
# col.3
#1 1
#2 1
#3 1
#4 2
#5 2
If the duplicates are considered based on the equality between adjacent elements, then use rleid
library(data.table)
df %>%
filter(!duplicated(rleid(col.2))) %>%
select(col.3 = col.1)
If we convert to data.table
, the unique
also have a by
option
library(data.table)
unique(setDT(df), by = 'col.2')[, .(col.3 = col.1)]
df <- structure(list(col.1 = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), col.2 = c(1L,
3L, 7L, 7L, 12L, 14L, 14L)), class = "data.frame", row.names = c(NA,
-7L))