Search code examples
rsequencedata-cleaning

r create a sequence from random numbers


I have a column, Col1 in my dataset with random numbers .

  ID  Date        Col1     Score
  13  2002-08-01  18221    60.75
  13  2010-08-12  18448    65.33
  11  2009-11-06  -65145   61.13
  11  2009-11-06  -65145   59.91
  12  2011-05-10   93910   62.10
  14  2009-05-29   13000   70.28
  15  2008-12-03   19423   39.72

My goal is to convert these seemingly random numbers into an ordered sequence (SequenceNum) like this.

  ID  Date        Col1     Score    SequenceNum
  13  2002-08-01  18221    60.75    1 
  13  2010-08-12  18448    65.33    2
  11  2009-11-06  -65145   61.13    2
  11  2009-11-06  -65145   59.91    1
  12  2011-05-10   93910   62.10    1
  14  2008-12-03   19423   39.72    2
  14  2009-05-29   13000   70.28    1

The rule is ,

For each ID ,

Rule 1)  if the Col1 values are different, then create a sequence 
based on the ascending order of values in Col1. For example ID 13. 
The values in Col1 are 18221 and 18448 so  the expected result is

  ID  Date        Col1     Score    SequenceNum
  13  2002-08-01  18221    60.75    1 
  13  2010-08-12  18448    65.33    2

Rule 2) If the Col1 values are same, then use the values in column 
`Score` and create a sequence based on ascending order of values in 
column `Score`. For ID 11, the values in Col1 are same(-65145) so the 
sequence number will be based on ascending order of values in column 
`Score`

  11  2009-11-06  -65145   61.13    2
  11  2009-11-06  -65145   59.91    1

Thanks in advance of any help on this issue.


Solution

  • We can use

    library(dplyr)
    df1 %>%
      mutate(rn = row_number()) %>%
      arrange(ID, Col1, Score) %>% 
       mutate(SequenceNum = row_number(), .by = ID) %>%
      arrange(rn) %>%
      select(-rn)
    

    -output

    ID       Date   Col1 Score SequenceNum
    1 13 2002-08-01  18221 60.75           1
    2 13 2010-08-12  18448 65.33           2
    3 11 2009-11-06 -65145 61.13           2
    4 11 2009-11-06 -65145 59.91           1
    5 12 2011-05-10  93910 62.10           1
    6 14 2009-05-29  13000 70.28           1
    7 14 2008-12-03  19423 39.72           2