Search code examples
rdataframeshift

Shifting a column of non-numeric variables


If I have a dataframe of variables, how do I shift the entries in one column (e.g. Column 4) up by one and replace empty cells with "NA"?

For numeric data:

mydata <- data.frame(replicate(5,sample(1:20,10,rep=TRUE)))

> mydata
   X1 X2 X3 X4 X5
1  12  2  4  7 10
2  15  2 15  3  8
3  11 12 18 10  3
4  18  8  4 17 12
5  16 17  2  8 10
6   6  3 14 15 18
7  14  3 14 14 13
8  16 15 15  9 14
9  14 12 15 20  3
10 10 16  8 18  5

I can achieve this with a 'shift' function:

shift <- function(x, n){
 c(x[-(seq(n))], rep(NA, n))
 }

mydata[,4] <- shift(mydata[,4], 1)

> mydata
   X1 X2 X3 X4 X5
1  12  2  4  3 10
2  15  2 15 10  8
3  11 12 18 17  3
4  18  8  4  8 12
5  16 17  2 15 10
6   6  3 14 14 18
7  14  3 14  9 13
8  16 15 15 20 14
9  14 12 15 18  3
10 10 16  8 NA  5

If my data is numeric, this works. But if my data is non-numeric, it changes my column to numeric representation.

mydata<- data.frame(replicate(5,sample(c("apple", "banana", "peach", "grape"),10,rep=TRUE)))

> mydata
   X1     X2     X3     X4    X5
1  banana banana banana  grape apple
2   apple  peach  grape  grape apple
3   grape  grape banana  peach peach
4   apple  apple  peach banana peach
5   grape banana  grape  apple peach
6   grape  grape  grape banana apple
7   grape  grape  peach  apple peach
8  banana  grape banana  apple grape
9   peach  apple  peach  peach grape
10  apple  peach banana  grape grape


shift <- function(x, n){
 c(x[-(seq(n))], rep(NA, n))
 }
mydata[,4] <- shift(mydata[,4], 1)

> mydata
   X1     X2     X3 X4    X5
1  banana banana banana  3 apple
2   apple  peach  grape  4 apple
3   grape  grape banana  2 peach
4   apple  apple  peach  1 peach
5   grape banana  grape  2 peach
6   grape  grape  grape  1 apple
7   grape  grape  peach  1 peach
8  banana  grape banana  4 grape
9   peach  apple  peach  3 grape
10  apple  peach banana NA grape

Any ideas how to retain the "apple/banana/peach/grape" words after the shift? Or perhaps another approach is better? Thank you!

Desired result:

> mydata
   X1     X2     X3     X4    X5
1  banana banana banana  grape apple
2   apple  peach  grape  peach apple
3   grape  grape banana banana peach
4   apple  apple  peach  apple peach
5   grape banana  grape banana peach
6   grape  grape  grape  apple apple
7   grape  grape  peach  apple peach
8  banana  grape banana  peach grape
9   peach  apple  peach  grape grape
10  apple  peach banana     NA grape

Solution

  • The problem is that data.frame is treating strings as factors.

    set.seed(0)
    fruit <- c("apple", "banana", "peach", "grape")
    mydata <- data.frame(replicate(5,sample(fruit, 10, rep=T)))
    
    > mydata
           X1     X2     X3     X4     X5
    1   grape  apple  grape banana banana
    2  banana  apple  grape banana  grape
    3  banana  apple  apple  peach  peach
    4   peach  peach  peach banana  grape
    5   grape banana  apple  apple  peach
    6   apple  grape banana  grape  peach
    7   grape banana banana  peach  grape
    8   grape  peach  apple  grape  apple
    9   peach  grape banana  apple banana
    10  peach banana  grape  peach  peach
    
    > class(mydata[, 'X4'])
    [1] "factor"
    

    To fix this, you could use the data.table package which does not treat stings as factors by default. It also ships with the shift function which does what you want. To shift the values "up" by one, set the argument type='lead':

    library(data.table)
    setDT(mydata)
    mydata[, X4 := shift(X4, 1, type='lead')]
    
    > mydata
            X1     X2     X3     X4     X5
     1:  grape  apple  grape banana banana
     2: banana  apple  grape  peach  grape
     3: banana  apple  apple banana  peach
     4:  peach  peach  peach  apple  grape
     5:  grape banana  apple  grape  peach
     6:  apple  grape banana  peach  peach
     7:  grape banana banana  grape  grape
     8:  grape  peach  apple  apple  apple
     9:  peach  grape banana  peach banana
    10:  peach banana  grape   <NA>  peach