Hey so I have a tibble with head() printed like this:
# A tibble: 6 × 1
id.make.model.year
<chr>
1 27550?????AM General?????DJ Po Vehicle 2WD?????1984
2 28426?????AM General?????DJ Po Vehicle 2WD?????1984
3 27549?????AM General?????FJ8c Post Office?????1984
4 28425?????AM General?????FJ8c Post Office?????1984
5 1032?????AM General?????Post Office DJ5 2WD?????1985
6 1033?????AM General?????Post Office DJ8 2WD?????1985
with only one column. I want to seperate this into four columns with those four column names. I tried to use separate()
A %>%
separate(id.make.model.year,into=c("id","make"),sep="?????")
and
A %>%
separate(id.make.model.year,into=c("id","make"),sep="\\?????")
but they both return the following error:
Error in stringi::stri_split_regex(value, sep, n_max) : Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX)
Yet another try...:
A %>%
separate(id.make.model.year,into=c("id","make"),sep="[?????]")
which returns
# A tibble: 33,439 × 2
id make
* <chr> <chr>
1 27550
2 28426
3 27549
4 28425
5 1032
6 1033
7 3347
8 13309
9 13310
10 13311
# ... with 33,429 more rows
Warning message:
Too many values at 33439 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...
I also tried dropping sep, but all the spaces are clearly counted as separators.
What's the right way to do this? Thanks in advance.
The regex to match one question mark is \?
, or [?]
. However if you have five of them, [?????]
still only one matches one occurrence of that character because [...]
defines a character class. Just like [aaaaa]
would only match one letter a
, not five.
So to capture the five repetitions I think you want \?{5}
or [?]{5}
(or \?\?\?\?\?
or [?][?][?][?][?]
).
Until you post data with dput()
I can't confirm.