Search code examples
rdata-cleaning

How do I only keep observations based on the max values after their decimal point?


I want to make this dataframe:

(edited to show that it's an actual data frame with more than 1 column)

ID = c(100.00, 100.12, 100.36, 101.00, 102.00, 102.24, 103.00, 103.36, 103.90)
blood = c(55, 54, 74, 42, 54, 45, 65, 34, 44)
df = data.frame(ID, blood)

  ID       blood
1 100.00    55
2 100.12    54
3 100.36    74
4 101.00    42
5 102.00    54
6 102.24    45
7 103.00    65
8 103.36    34
9 103.90    44

Become this one:

ID = c(100.36, 101.00, 102.24, 103.36)
df2 = data.frame(ID)

  ID2        blood2
1 100.36     74
2 101.00     42
3 102.24     45
4 103.90     44

In other words, for any given whole number (like 102) I only want to keep the highest decimal version of it. So basically I need to tell R to only keep the highest "version" of each whole number. Any ideas how?


Solution

  • > ID = c(100.00, 100.12, 100.36, 101.00, 102.00, 102.24, 103.00, 103.36)
    > ID2 <- tapply( ID, floor(ID), FUN=max)
    > ID2
       100    101    102    103 
    100.36 101.00 102.24 103.36 
    > (df2 <- data.frame(ID2))
           ID2
    100 100.36
    101 101.00
    102 102.24
    103 103.36
    > (df2 <- data.frame(ID=as.vector(ID2)))
          ID
    1 100.36
    2 101.00
    3 102.24
    4 103.36
    

    expanded

    > ID = c(100.00, 100.12, 100.36, 101.00, 102.00, 102.24, 103.00, 103.36, 103.9)
    > blood = c(55, 54, 74, 42, 54, 45, 65, 34, 44)
    > df = data.frame(ID, blood)
    > 
    > tmp <- tapply( df$ID, floor(df$ID), FUN=function(x) x==max(x))
    > 
    > (df2 <- df[unlist(tmp),])
          ID blood
    3 100.36    74
    4 101.00    42
    6 102.24    45
    9 103.90    44