I have a character vector V1
V1 <- c("377 Peninsula St. Ogden,UT","8532 West Lyme St. Chesterfield,
VA","43 E. Hilltop Street Hilliard,OH","95 Newcastle St.
Hendersonville,NC","7276 Rose St. Greenville,NC")
and another vector as V2
V2 <- c(84404,23832,43026,28792,27834)
Now I have these conditions:
1) Break each item in V1
at 24th
character:
a) If 24th character is a comma
then break the string there and remaining should be added to corresponding string in V2.
e.g. V1
has "377 Peninsula St. Ogden, UT
", wherein we have comma at 24th index thus we need to break this in two "377 Peninsula St. Ogden
" "UT
" (mind that comma itself is omitted) and then V1
gets "377 Peninsula St. Ogden
" part and remaining is added to corresponding PIN in V2
thus "84404
" in V2
becomes "UT 84404
"
b) If 24th character is non-comma
and non-whitespace
find out last whitespace before comma in V1
and upto that index V1
keeps, remaining goes to V2
.
e.g. V1
has "8532 West Lyme St. Chesterfield, VA
", wherein we have "t
" at 24th index thus we need to break it from the whitespace after "St.
" thus V1
keeps "8532 West Lyme St.
" and V2
gets "Chesterfield, VA 23832
".
By the end of the operations we should have:
V1 <- c("377 Peninsula St. Ogden","8532 West Lyme St.",...)
V2 <- c("UT 84404","Chesterfield, VA 23832")
EDIT:
I tried following function on V1 to know whether 24th character is a comma:
unlist(lapply(lapply(V1, function(z){substr(z,24,24)}),function(y){y==","}))
which returns:
TRUE FALSE FALSE FALSE FALSE
Now that I have solved one part of the problem, I need a way to apply the formatting logic based on the result above.
i.e. I want to do:
unlist(lapply(lapply(V1, function(z){substr(z,24,24)}),function(y){if(y==","){something1} else if(y==" "){something2}else {something3}}))
Here something1/2/3 come from 1a and 1b above. Need to know how to write this logic.
Consider following using vectorized methods of ifelse
, substr
, and regexpr
(i.e., no apply loops):
newV1 <- ifelse(substr(V1, 24, 24) == ",", # CONDITIONALLY CHECK 24TH CHARACTER
substr(V1, 1, regexpr(",", V1)-1), # EXTRACT UNTIL 24TH CHARACTER
substr(V1, 1,
regexpr(" (?=[^ ]+$)",
substr(V1, 1, 24),
perl=TRUE)-1) # EXTRACT UNTIL LAST SPACE BEFORE 24TH CHAR
)
newV1
# [1] "377 Peninsula St. Ogden" "8532 West Lyme St."
# [3] "43 E. Hilltop Street" "95 Newcastle St."
# [5] "7276 Rose St."
newV2 <- paste(ifelse(substr(V1, 24, 24) == ",", # CONDITIONALLY CHECK 24TH CHARACTER
substr(V1, regexpr(",", V1)+1,
nchar(V1)), # EXTRACT AFTER 24TH CHARACTER
substr(V1,
regexpr(" (?=[^ ]+$)",
substr(V1, 1, 24),
perl=TRUE)+1,
nchar(V1))), # EXTRACT AFTER LAST SPACE BEFORE 24TH CHAR
V2) # PASTE V2 VECTOR ELEMENTWISE
newV2
# [1] "UT 84404" "Chesterfield, VA 23832"
# [3] "Hilliard,OH 43026" "Hendersonville,NC 28792"
# [5] "Greenville,NC 27834"