Search code examples
rdecimalformatdecimal-point

Formatting decimal places in a character column. as.numeric erase the values in the column


I am working with a dataset where a column contains variables with many decimal figures.

Examples:

,958229561278528615818098193915712388824
2,05561009284393218251509777394193942492
2,72096803821411321343605598060792704404
2,00324997757400185789440370684992098409

and I would need to format differently the decimal places. The column is read as character in R. As long as it stays so, I can not use any function such as round() or similar ones.

The problem is that

as.numeric((data$value))
as.numeric((as.character(data$value))

will both erase my column, giving me back a column of NAs. I also tried to import the dataset directly from the interface and converting the column to numeric, but it just gives an "unknown" format of the column and it formats the figures like this: 6.8e+38 1.9e+38 5.9e+38

which I don't want either.

Extra info: the dataset has been created directly in R by manipulating (merge, left_join) other datasets.

Any help is greatly appreciated!


Solution

  • I assume you are somewhere which uses a comma for a decimal point, and perhaps a decimal point in place of a thousands separator.

    As an example:

    df <- c(',958229561278528615818098193915712388824', '2,05561009284393218251509777394193942492', '2,72096803821411321343605598060792704404', '2,00324997757400185789440370684992098409')
    

    First, remove any decimal points, because they may be thousands separators. Then, replace the comma with a decimal point:

    as.numeric(gsub(',', '.', gsub('\\.', '', df)))
    

    Edit: however, if you intend to use more than the first few decimal places, you may run into problems with precision. Look into the package Rmpfr if you need arbitrary precision.