I have this code that splits the column on the second space, but I don't know how to modify it to split on the first space only. I'm not that familiar with regex.
library(tidyr)
df <- data.frame(Location = c("San Jose CA", "Fremont CA", "Santa Clara CA"))
separate(df, Location, into = c("city", "state"), sep = " (?=[^ ]+$)")
# city state
# 1 San Jose CA
# 2 Fremont CA
# 3 Santa Clara CA
You can use
library(tidyr)
df <- data.frame(Location = c("San Jose CA", "Fremont CA", "Santa Clara CA"))
df_new <- separate(df, Location, into = c("city", "state"), sep = "^\\S*\\K\\s+")
Output:
> df_new
city state
1 San Jose CA
2 Fremont CA
3 Santa Clara CA
The ^\S*\K\s+
regex matches
^
- start of string\S*
- zero or more non-whitespace chars\K
- match reset operator that discards the text matched so far from the overall match memory buffer\s+
- one or more whitespace chars.NOTE: If your strings can have leading whitespace, and you want to ignore this leading whitespace, you can add \\s*
right after ^
and use
sep = "^\\s*\\S+\\K\\s+"
Here, \S+
will require at least one (or more) non-whitespace chars to exist before the whitespaces that the string is split with.