Search code examples
ralphabeticalnumerical

detect alphabetical and numerical order in R


I would need a R code that tells me whether the values in each row have been placed in alphabetical and numerical order, for each row. By alphabetical order I mean comparing the cells in each column of a row, starting from the first cell to the last one. An example of alphabetical_row = c(A61B, H01J, H03B, G01Q).

row1 <- c("G01N 23/20", "G01N 23/203", "G01Q 30/00", "G01Q 30/04", "G01Q 30/18", "H01J 37/252", "H01J 37/252")
row2 <- c("G01S 7/38", "G01S 7/38", "H03B 21/00", "H03B 21/02", NA, NA, NA)
row3 <- c("A61B 8/00", "A61B 8/00", "G01S 7/52", "G01S 7/52", NA, NA, NA)

 df <-      data.frame(rbind(row1, row2, row3))

The output I am looking for is a new column with TRUE, in case the values in the row are in order, or FALSE, in case the values are not in order, for each row.

However, lets start with the first 4 digits:

row1 <- c("G01N", "G01N", "G01Q", "G01Q", "G01Q", "H01J", "H01J")
row2 <- c("G01S", "G01S", "H03B", "H03B", NA, NA, NA)
row3 <- c("A61B", "A61B", "G01S", "G01S", NA, NA, NA)

 df <-      data.frame(rbind(row1, row2, row3))

Desired output:

df <- data.frame(cbind(df, c(TRUE, TRUE, TRUE))) 

in this case the output would be: FALSE, FALSE, TRUE, because only row3 is in alphabetical order, from cell 1 (A61B) to cell 4 (G01S).


Solution

  • This will test for alphabetical order, and also require the NA values are last. Your sample data uses factor columns, but I would strongly recommend converting them to character (with as.character()) since rows rather than columns are meaningful. If the columns of the data frame are character, then you can leave out the as.character() part of the answer. If you want to adjust the NA behavior, see ?order for options.

    df$ordered = apply(df, 1, function(x) identical(order(as.character(x)), seq_along(x)))
    
    df 
    #        X1   X2   X3   X4   X5   X6   X7 ordered
    # row1 G01N G01N G01Q G01Q G01Q H01J H01J    TRUE
    # row2 G01S G01S H03B H03B <NA> <NA> <NA>    TRUE
    # row3 A61B A61B G01S G01S <NA> <NA> <NA>    TRUE