Search code examples
rdataframemove

Move rows in a dataframe accordingly to one column


I am looking to move specific rows in my dataframe, accordingly to one column.

df <- read.table(text = 'ID Day Count
33012   9526    4
35004   9526    4
37006   9526    4
37008   9526    4
21009   1913    3
24005   1913    3
25009   1913    3
29002   12551   2
30001   12551   2
25009   14329   1
48007   9525    0
49002   1912    0
51003   12550   0
56001   12550   0', header = TRUE)

Given the above dataframe, I would like move the rows with 0s in the Count column accordingly to the column Day, i.e. day 9525 after 9526, 1912 after 1913, 12550 after 12551.

The output should be:

ID       Day  Count
33012   9526    4
35004   9526    4
37006   9526    4
37008   9526    4
48007   9525    0
21009   1913    3
24005   1913    3
25009   1913    3
49002   1912    0
29002   12551   2
30001   12551   2
51003   12550   0
56001   12550   0
25009   14329   1

Please ignore the ID column.


New question related to this topic:

Move rows in a dataframe accordingly to one column (2)


Solution

  • Answer to updated question

    I think I found a really cool solution to your updated question:

    df[order(match(df$Day+(z <- df$Count==0L),unique(df$Day[!z])),z),];
    ##       ID   Day Count
    ## 1  33012  9526     4
    ## 2  35004  9526     4
    ## 3  37006  9526     4
    ## 4  37008  9526     4
    ## 11 48007  9525     0
    ## 5  21009  1913     3
    ## 6  24005  1913     3
    ## 7  25009  1913     3
    ## 12 49002  1912     0
    ## 8  29002 12551     2
    ## 9  30001 12551     2
    ## 13 51003 12550     0
    ## 14 56001 12550     0
    ## 10 25009 14329     1
    

    This solution orders by two things:

    1: First, it orders by the "canonical" Day value. The canonical Day value is taken as df$Day for non-zero Count rows, and df$Day+1L for zero Count rows. This is accomplished by matching the canonical Day values into a vector of unique canonical Day values of the non-zero Count rows, which also functions to preserve the incoming order of the canonical Day values. The canonical Day values are computed using logical addition, which treats FALSE as zero and TRUE as one. The zero/non-zero distinction is captured in a local variable z on-the-fly, absolving the need for subsequent redundant computation of that information.

    2: Second, it orders the non-zero Count rows before the zero Count rows. Since z has already been computed earlier in the line, we can simply pass it as the second argument to order() to do this. When ordering by a logical vector, FALSE is ordered before TRUE, thus it works directly.


    Answer to original question

    I think this is what you're looking for:

    df$vl <- ave(df$vl,df$id,FUN=function(x) sort(decreasing=T,x));
    df;
    ##    id vl
    ## 1   C  5
    ## 2   C  3
    ## 3   C  2
    ## 4   C  2
    ## 5   A  5
    ## 6   A  5
    ## 7   A  4
    ## 8   A  2
    ## 9   B  4
    ## 10  B  2
    ## 11  B  1
    ## 12  B  1
    

    The above sorts the vl column within each id group, independently of the other id groups.

    Data

    set.seed(1L);
    df <- data.frame(id=rep(c('C','A','B'),each=4L),vl=sample(5L,12L,T));
    

    Another interpretation of your question is that you want to sort the entire data.frame by the vl column, but within each unique value of vl, you want to prefer the order in which the unique values in the id column occur in the original data.frame (notwithstanding the fact that not all id values are represented alongside each unique vl value). Here's how that can be done:

    df[order(-df$vl,match(df$id,unique(df$id))),];
    ##    id vl
    ## 1   C  5
    ## 5   A  5
    ## 6   A  5
    ## 7   A  4
    ## 9   B  4
    ## 2   C  3
    ## 3   C  2
    ## 4   C  2
    ## 8   A  2
    ## 10  B  2
    ## 11  B  1
    ## 12  B  1