New R-bie,
I am trying to clean 3 columns of data from my dataframe df
. The columns consist of numeric elements that range in their value from 0.19, 687.00, 49405, 107.440, 13764.000, 1.740. I will create df
below for the purpose of this example. The goal is going to be to implement this line of code into a mutate function from dplyr so clean a column of data from data.frame
.
Example:
df <- c(1.560, 1.790, 3456.000, 1.0700, 0.16000, 1.347, 4.200)
I have been trying to remove the 0's at the end of the elements so that they all look like this
df <- c(1.56, 1.79, 3456, 1.07, 0.16, 1.347, 4.20)
I can partially achieve my desired results by running the line of code below:
signif(df[1], 5)
signif(df[2], 5)
signif(df[3], 5)
signif(df[4], 5)
signif(df[5], 5)
signif(df[6], 5)
signif(df[7], 5)
with the df[7] element 4.200
returning 4.2
Although I have to do this one by one otherwise if I do: signif(df[1:6], 5)
, i get this vector returned 1.560 1.790 3456.000 1.070 0.160 1.347 4.200
1.347
as they were, but clean the rest of the column to then remove an exact match of ".00"
to get a whole integer leaving 3456
and '4.20'.
When using "(\\.000)$"
to match and remove 0's from (eg. 4128.000, 13764.000
), other elements also have their 0's removed (eg. 4.2
, 0.9
) instead of leaving 4.200
and 0.900
, from which I'd like to extract 4.20
and 0.90
.
Using "(0)$"
doesn't work either, and I have tried a plethora of regex variations to achieve this...any ehlp would be much appreciated.It is true that the trailing "000"'s disappear with sub
or gsub
using that pattern, but not because of the pattern matching any characters. Rather it's entirely because of the initial conversion to "character" class:
> df <- c(1.560, 1.790, 3456.000, 1.0700, 0.16000, 1.347, 4.200)
>
> sub("\\.000","",df)
[1] "1.56" "1.79" "3456" "1.07" "0.16" "1.347" "4.2"
> as.character(df) #no `sub(` at all
[1] "1.56" "1.79" "3456" "1.07" "0.16" "1.347" "4.2"
And if you wanted 2 digits to the right of the decimal point you could do:
format(as.vector(df), digits=2)
[1] " 1.56" " 1.79" "3456.00" " 1.07" " 0.16" " 1.35" " 4.20"
And to get rid of the quotes use print
(although they remain character value so you cannot use arithmetic operators on that result.:
print(format(as.vector(df), digits=2) , quote=FALSE)
[1] 1.56 1.79 3456.00 1.07 0.16 1.35 4.20