I am using a regex that is suggested here to repleace any type of phone numbers with aaaaaaaaaa
.
This a snapshot of my data :
df <- data.frame(
text = c(
'my number is (123)-416-567',
"1 321 124 7889 is valid",
'why not taking 987-012-6782',
'120 967 3256 is correct',
'call at 888 969 9919',
'please text at 1 647 989 1213'
)
)
df %>% select(text)
text
1 my number is (123)-416-567
2 1 321 124 7889 is valid
3 why not taking 987-012-6782
4 120 967 3256 is correct
5 call at 888 969 9919
6 please text at 1 647 989 1213
My code is
df %>%
mutate(
text = str_replace_all(text, '^(\+\d{1,2}\s)?\(?\d{3}\)?[\s.-]\d{3}[\s.-]\d{4}$', 'aaaaaaaaaa')
)
and I get this error
Error: '\+' is an unrecognized escape in character string starting "'^(\+"
Error: unexpected ')' in " )"
The outcome should be like
text
1 my number is aaaaaaaaaa
2 aaaaaaaaaa is valid
3 why not taking aaaaaaaaaa
4 aaaaaaaaaa is correct
5 call at aaaaaaaaaa
6 please text at aaaaaaaaaa
You can use
str_replace_all(text, '(?:\\+?\\d{1,2}\\s)?\\(?\\d{3}\\)?[\\s.-]\\d{3}[\\s.-]\\d{3,4}(?!\\d)', 'aaaaaaaaaa')
See the regex demo.
Details:
(?:\+?\d{1,2}\s)?
- an optional sequence of an optional +
and then one or two digits and a whitespace\(?
- an optional (
\d{3}
- three digits\)?
- an optional )
[\s.-]
- a -
, .
or whitespace\d{3}
- three digits[\s.-]
- a -
, .
or whitespace\d{3,4}
- three or four digits(?!\d)
- no digit alowed right after.Notes:
\
char^
and $
match start/end of string so in this case, it makes sense to remove the ^
anchor, and replace $
with a right-digit boundary\d{3}
did not match numbers where the last part contained four digits, so I replaced it with \d{3,4}
.