I would like to manually correct a record by using R. Last name and first name should always be separated by a comma.
names <- c("ADAM, Smith J.", "JOHNSON. Richard", "BROWN, Wilhelm K.", "DAVIS, Daniel")
Sometimes, however, a full stop has crept in as a separator, as in the case of "JOHNSON. Richard". I would like to do this automatically. Since the last name is always at the beginning of the line, I can simply access it via sub
:
sub("^[[:upper:]]+\\.","^[[:upper:]]+\\,",names)
However, I cannot use a function for the replacement that specifically replaces the full stop with a comma.
Is there a way to insert a function into the replacement that does this for me?
Your sub
is mostly correct, but you'll need a capture group (the brackets and backreference \\1
) for the replacement.
Because we are "capturing" the upper case letters, therefore \\1
here represents the original upper case letters in your original strings. The only replacement here is \\.
to \\,
. In other words, we are replacing upper case letters ^(([[:upper:]]+)
AND full stop \\.
with it's original content \\1
AND comma \\,
.
For more details you can visit this page.
test_names <- c("ADAM, Smith J.", "JOHNSON. Richard", "BROWN, Wilhelm K.", "DAVIS, Daniel")
sub("^([[:upper:]]+)\\.","\\1\\,",test_names)
[1] "ADAM, Smith J." "JOHNSON, Richard" "BROWN, Wilhelm K."
[4] "DAVIS, Daniel"