Search code examples
rdplyrcoordinatesdata-manipulationcalculated-columns

How to manage values of a column when some of them have been read in a wrongly?


I am trying to manage a very huge dataframe in which two columns stands for X and Y coordinates. The dataframe have been generated automatically with a software that I have no possibility to manipulate. The problem comes once R read that dataframe and some coords are displayed without the decimal separator, which results in values containing numbers 14-16 character long. The dataframe present the following structure (with no factors):

V1  V2           V3
2   41.79905233 12.572052
4   41.7990535  12.57205367
6   4179905383  1257205483
8   4179905433  1257205533
10  41.79905417 12.57205533
12  4179905417  1257205583

So, my question is: How can I to identify values not showing the correct format and how to modify them into the correct one (i.e., two ciphers followed by dec separator 41.245245) taking into account that the number of characters of each cell is not constant? Consider also that the distribution of values along the columns does not show any pattern.

Thank you in advance.

I tried these codes with dplyr without success... even remotely

df %>%
  mutate(df$V4 == ifelse(V2 > 100, V2/100000000, V2 == V2))

df %>%
    mutate(df$V4 = ifelse(V2, function(x) as.numeric(x)[1] > 100, 
                        map_dbl(V4, function(x) as.numeric(x)[1] / 100000000),
                        V4))

Solution

  • If you know the coordinates are always supposed to be between 10 and 99.999, you could use some math to keep the output in that range without a departure into strings:

    library(dplyr)
    df %>%
      mutate(across(V2:V3, ~.x / 10^(floor(log10(.x))-1)))
    
      V1       V2       V3
    1  2 41.79905 12.57205
    2  4 41.79905 12.57205
    3  6 41.79905 12.57205
    4  8 41.79905 12.57206
    5 10 41.79905 12.57206
    6 12 41.79905 12.57206