Search code examples
rsignificant-digits

Remove Unwanted 0's from numeric element - R


New R-bie,

I am trying to clean 3 columns of data from my dataframe df. The columns consist of numeric elements that range in their value from 0.19, 687.00, 49405, 107.440, 13764.000, 1.740. I will create df below for the purpose of this example. The goal is going to be to implement this line of code into a mutate function from dplyr so clean a column of data from data.frame.

Example:

 df <- c(1.560, 1.790, 3456.000, 1.0700, 0.16000, 1.347, 4.200)

I have been trying to remove the 0's at the end of the elements so that they all look like this

df <- c(1.56, 1.79, 3456, 1.07, 0.16, 1.347, 4.20)

I can partially achieve my desired results by running the line of code below:

signif(df[1], 5) 
signif(df[2], 5) 
signif(df[3], 5) 
signif(df[4], 5) 
signif(df[5], 5)
signif(df[6], 5) 
signif(df[7], 5) 

with the df[7] element 4.200 returning 4.2

Although I have to do this one by one otherwise if I do: signif(df[1:6], 5), i get this vector returned 1.560 1.790 3456.000 1.070 0.160 1.347 4.200

  1. I have also tried using regex to extract the patterns of 0's at the end of the object, but any quantifiers or expression I use seems to remove all the trailing zeros. I was thinking of removing the last digit if it were a 0, to leave numbers like 1.347 as they were, but clean the rest of the column to then remove an exact match of ".00" to get a whole integer leaving 3456 and '4.20'. When using "(\\.000)$" to match and remove 0's from (eg. 4128.000, 13764.000), other elements also have their 0's removed (eg. 4.2, 0.9) instead of leaving 4.200 and 0.900, from which I'd like to extract 4.20 and 0.90. Using "(0)$" doesn't work either, and I have tried a plethora of regex variations to achieve this...any ehlp would be much appreciated.

Solution

  • It is true that the trailing "000"'s disappear with sub or gsub using that pattern, but not because of the pattern matching any characters. Rather it's entirely because of the initial conversion to "character" class:

    >  df <- c(1.560, 1.790, 3456.000, 1.0700, 0.16000, 1.347, 4.200)
    > 
    > sub("\\.000","",df)
    [1] "1.56"  "1.79"  "3456"  "1.07"  "0.16"  "1.347" "4.2"  
    > as.character(df)  #no `sub(` at all
    [1] "1.56"  "1.79"  "3456"  "1.07"  "0.16"  "1.347" "4.2"  
    

    And if you wanted 2 digits to the right of the decimal point you could do:

    format(as.vector(df), digits=2)
    [1] "   1.56" "   1.79" "3456.00" "   1.07" "   0.16" "   1.35" "   4.20"
    

    And to get rid of the quotes use print(although they remain character value so you cannot use arithmetic operators on that result.:

    print(format(as.vector(df), digits=2) , quote=FALSE)
    [1]    1.56    1.79 3456.00    1.07    0.16    1.35    4.20