I am trying to manage a very huge dataframe in which two columns stands for X and Y coordinates. The dataframe have been generated automatically with a software that I have no possibility to manipulate. The problem comes once R read that dataframe and some coords are displayed without the decimal separator, which results in values containing numbers 14-16 character long. The dataframe present the following structure (with no factors):
V1 V2 V3
2 41.79905233 12.572052
4 41.7990535 12.57205367
6 4179905383 1257205483
8 4179905433 1257205533
10 41.79905417 12.57205533
12 4179905417 1257205583
So, my question is: How can I to identify values not showing the correct format and how to modify them into the correct one (i.e., two ciphers followed by dec separator 41.245245) taking into account that the number of characters of each cell is not constant? Consider also that the distribution of values along the columns does not show any pattern.
Thank you in advance.
I tried these codes with dplyr without success... even remotely
df %>%
mutate(df$V4 == ifelse(V2 > 100, V2/100000000, V2 == V2))
df %>%
mutate(df$V4 = ifelse(V2, function(x) as.numeric(x)[1] > 100,
map_dbl(V4, function(x) as.numeric(x)[1] / 100000000),
V4))
If you know the coordinates are always supposed to be between 10 and 99.999, you could use some math to keep the output in that range without a departure into strings:
library(dplyr)
df %>%
mutate(across(V2:V3, ~.x / 10^(floor(log10(.x))-1)))
V1 V2 V3
1 2 41.79905 12.57205
2 4 41.79905 12.57205
3 6 41.79905 12.57205
4 8 41.79905 12.57206
5 10 41.79905 12.57206
6 12 41.79905 12.57206