from readr
fails if the character string contains a .
It works well with special characters.
#does not work
parse_number("art. 23")
Warning: 1 parsing failure.
row col expected actual
1 -- a number .
[1] NA
# A tibble: 1 x 4
row col expected actual
<int> <int> <chr> <chr>
1 1 NA a number .
Why is this happening?
The excpected result would be 23
There is a space in after the dot which is causing an error. What is the expected number from this sequence (0.23 or 23)?
seems to look for decimal and grouping separators as defined by your locale, see the documentation here
You can opt to change the locale using the following (grouping_mark is a dot with a space):
parse_number("art. 23", locale=locale(grouping_mark=". ", decimal_mark=","))
Output: 23
or remove the space in front:
parse_number(gsub(" ", "" , "art. 23"))
Output: 0.23
Edit: To handle dots as abbreviations and numbers use the following:
> as.numeric(str_extract("art. 23", "\\d+\\.*\\d*"))
[1] 23
> as.numeric(str_extract("%ç*%&23", "\\d+\\.*\\d*"))
[1] 23
The above uses regular expressions to identify number patterns within strings.
finds a digits\\.*
finds a dot\\d*
finds the remaining digits Note: I am no expert on regex but there are plenty of other resources that will make you one