I am trying to reorder values rowwise in a dataframe in R.
data <- data.frame(sample = c("A", "B", "C", "D", "E", "F"),
year_1 = c(2015, 2014, NA, 1985, 2011, 2010),
year_2 = c(NA, 1986, 1999, 1986, 2009, 2009),
year_3 = c(2015, 2014, 2014, 1956, NA, 2010),
year_4 = c(NA, 2014, 2014, 1985, 2010, 2010))
data
> sample year_1 year_2 year_3 year_4
> 1 A 2015 NA 2015 NA
> 2 B 2014 1986 2014 2014
> 3 C NA 1999 2014 2014
> 4 D 1985 1986 1956 1985
> 5 E 2011 2009 NA 2010
> 6 F 2010 2009 2010 2010
I would like to reorder each row so that year_1
has the highest year, year_2
has the next highest, and so forth. I would also like to keep all columns, so any NA
s would appear starting with year_4
.
I have managed to reorder the values in each row as described above.
cols <- c("year_1", "year_2", "year_3", "year_4")
data[cols] <- t(apply(data[cols],
MARGIN = 1,
FUN = function(x) sort(x, decreasing = TRUE, na.last = TRUE)))
data
> sample year_1 year_2 year_3 year_4
> 1 A 2015 2015 NA NA
> 2 B 2014 2014 2014 1986
> 3 C 2014 2014 1999 NA
> 4 D 1986 1985 1985 1956
> 5 E 2011 2010 2009 NA
> 6 F 2010 2010 2010 2009
HOWEVER, I would also like to remove any duplicates and replace with NA
(to appear at the end of the row). For example, for Sample D, the row should be 1986, 1985, 1956, NA
.
END GOAL:
> sample year_1 year_2 year_3 year_4
> 1 A 2015 NA NA NA
> 2 B 2014 1986 NA NA
> 3 C 2014 1999 NA NA
> 4 D 1986 1985 1956 NA
> 5 E 2011 2010 2009 NA
> 6 F 2010 2009 NA NA
Note: my data have several hundred thousand rows, so efficient code is much appreciated.
You were close! One option would be to just replace the duplicated
values with NA
before sorting:
cols <- c("year_1", "year_2", "year_3", "year_4")
data[cols] <- t(apply(data[cols], 1, \(x){
x[duplicated(x)] <- NA
sort(x, decreasing = TRUE, na.last = TRUE)
}))
Output:
sample year_1 year_2 year_3 year_4
1 A 2015 NA NA NA
2 B 2014 1986 NA NA
3 C 2014 1999 NA NA
4 D 1986 1985 1956 NA
5 E 2011 2010 2009 NA
6 F 2010 2009 NA NA