parse_number
from readr
fails if the character string contains a .
It works well with special characters.
library(readr)
#works
parse_number("%ç*%&23")
#does not work
parse_number("art. 23")
Warning: 1 parsing failure.
row col expected actual
1 -- a number .
[1] NA
attr(,"problems")
# A tibble: 1 x 4
row col expected actual
<int> <int> <chr> <chr>
1 1 NA a number .
Why is this happening?
Update:
The excpected result would be 23
There is a space in after the dot which is causing an error. What is the expected number from this sequence (0.23 or 23)?
parse_number
seems to look for decimal and grouping separators as defined by your locale, see the documentation here https://www.rdocumentation.org/packages/readr/versions/1.3.1/topics/parse_number
You can opt to change the locale using the following (grouping_mark is a dot with a space):
parse_number("art. 23", locale=locale(grouping_mark=". ", decimal_mark=","))
Output: 23
or remove the space in front:
parse_number(gsub(" ", "" , "art. 23"))
Output: 0.23
Edit: To handle dots as abbreviations and numbers use the following:
library(stringr)
> as.numeric(str_extract("art. 23", "\\d+\\.*\\d*"))
[1] 23
> as.numeric(str_extract("%ç*%&23", "\\d+\\.*\\d*"))
[1] 23
The above uses regular expressions to identify number patterns within strings.
\\d+
finds a digits\\.*
finds a dot\\d*
finds the remaining digits Note: I am no expert on regex but there are plenty of other resources that will make you one