If I have a dataframe of variables, how do I shift the entries in one column (e.g. Column 4) up by one and replace empty cells with "NA"?
For numeric data:
mydata <- data.frame(replicate(5,sample(1:20,10,rep=TRUE)))
> mydata
X1 X2 X3 X4 X5
1 12 2 4 7 10
2 15 2 15 3 8
3 11 12 18 10 3
4 18 8 4 17 12
5 16 17 2 8 10
6 6 3 14 15 18
7 14 3 14 14 13
8 16 15 15 9 14
9 14 12 15 20 3
10 10 16 8 18 5
I can achieve this with a 'shift' function:
shift <- function(x, n){
c(x[-(seq(n))], rep(NA, n))
}
mydata[,4] <- shift(mydata[,4], 1)
> mydata
X1 X2 X3 X4 X5
1 12 2 4 3 10
2 15 2 15 10 8
3 11 12 18 17 3
4 18 8 4 8 12
5 16 17 2 15 10
6 6 3 14 14 18
7 14 3 14 9 13
8 16 15 15 20 14
9 14 12 15 18 3
10 10 16 8 NA 5
If my data is numeric, this works. But if my data is non-numeric, it changes my column to numeric representation.
mydata<- data.frame(replicate(5,sample(c("apple", "banana", "peach", "grape"),10,rep=TRUE)))
> mydata
X1 X2 X3 X4 X5
1 banana banana banana grape apple
2 apple peach grape grape apple
3 grape grape banana peach peach
4 apple apple peach banana peach
5 grape banana grape apple peach
6 grape grape grape banana apple
7 grape grape peach apple peach
8 banana grape banana apple grape
9 peach apple peach peach grape
10 apple peach banana grape grape
shift <- function(x, n){
c(x[-(seq(n))], rep(NA, n))
}
mydata[,4] <- shift(mydata[,4], 1)
> mydata
X1 X2 X3 X4 X5
1 banana banana banana 3 apple
2 apple peach grape 4 apple
3 grape grape banana 2 peach
4 apple apple peach 1 peach
5 grape banana grape 2 peach
6 grape grape grape 1 apple
7 grape grape peach 1 peach
8 banana grape banana 4 grape
9 peach apple peach 3 grape
10 apple peach banana NA grape
Any ideas how to retain the "apple/banana/peach/grape" words after the shift? Or perhaps another approach is better? Thank you!
Desired result:
> mydata
X1 X2 X3 X4 X5
1 banana banana banana grape apple
2 apple peach grape peach apple
3 grape grape banana banana peach
4 apple apple peach apple peach
5 grape banana grape banana peach
6 grape grape grape apple apple
7 grape grape peach apple peach
8 banana grape banana peach grape
9 peach apple peach grape grape
10 apple peach banana NA grape
The problem is that data.frame
is treating strings as factors.
set.seed(0)
fruit <- c("apple", "banana", "peach", "grape")
mydata <- data.frame(replicate(5,sample(fruit, 10, rep=T)))
> mydata
X1 X2 X3 X4 X5
1 grape apple grape banana banana
2 banana apple grape banana grape
3 banana apple apple peach peach
4 peach peach peach banana grape
5 grape banana apple apple peach
6 apple grape banana grape peach
7 grape banana banana peach grape
8 grape peach apple grape apple
9 peach grape banana apple banana
10 peach banana grape peach peach
> class(mydata[, 'X4'])
[1] "factor"
To fix this, you could use the data.table
package which does not treat stings as factors by default. It also ships with the shift
function which does what you want. To shift the values "up" by one, set the argument type='lead'
:
library(data.table)
setDT(mydata)
mydata[, X4 := shift(X4, 1, type='lead')]
> mydata
X1 X2 X3 X4 X5
1: grape apple grape banana banana
2: banana apple grape peach grape
3: banana apple apple banana peach
4: peach peach peach apple grape
5: grape banana apple grape peach
6: apple grape banana peach peach
7: grape banana banana grape grape
8: grape peach apple apple apple
9: peach grape banana peach banana
10: peach banana grape <NA> peach