I have a data frame with one variable. It looks something like this:
df <- data.frame(c("25 Edgemont 52 Sioux County", "57 Burke 88 Papillion-LaVista South"))
To provide more context, each observation/row is a basketball game score. I would like to separate into four data frame columns that splits the numbers and team names up. So for example, the first row would end up as "25" in first column, "Edgemont" in second column, "52" in third column, and Sioux City in fourth column.
I've tried the below and various SO suggestions but can't get the desired results:
df2 <- strsplit(gsub("([0-9]*)([a-z]*)([0-9]*)([a-z]*)", "\\1 \\2 \\3 \\4", df), " ")
1) dplyr/tidyr Replace each number with a semicolon, that number and another semicolon and then separate on the semicolons plus optional surrounding whitespace.
library(dplyr)
library(tidyr)
# input
df <- data.frame(V1 = c("25 Edgemont 52 Sioux County",
"57 Burke 88 Papillion-LaVista South"))
df %>%
mutate(V1 = gsub("(\\d+)", ";\\1;", V1)) %>%
separate(V1, c(NA, "No1", "Let1", "No2", "Let2"), sep = " *; *")
## No1 Let1 No2 Let2
## 1 25 Edgemont 52 Sioux County
## 2 57 Burke 88 Papillion-LaVista South
1a) read.table We can use the same gsub
as in (1) but then separate it using read.table
. No packages are used.
read.table(text = gsub("(\\d+)", ";\\1;", df$V1), sep = ";", as.is = TRUE,
strip.white = TRUE, col.names = c(NA, "No1", "Let1", "No2", "Let2"))[-1]
## No1 Let1 No2 Let2
## 1 25 Edgemont 52 Sioux County
## 2 57 Burke 88 Papillion-LaVista South
2) strcapture We can use strcapture
from base R:
proto <- list(No1 = integer(0), Let1 = character(0),
No2 = integer(0), Let2 = character(0))
strcapture("(\\d+) (.*) (\\d+) (.*)", df$V1, proto)
## No1 Let1 No2 Let2
## 1 25 Edgemont 52 Sioux County
## 2 57 Burke 88 Papillion-LaVista South
2a) read.pattern We can use read.pattern with the same pattern as in (2):
library(gsubfn)
read.pattern(text = format(df$V1), pattern = "(\\d+) (.*) (\\d+) (.*)",
col.names = c("No1", "Let1", "No2", "Let2"), as.is = TRUE, strip.white = TRUE)
## No1 Let1 No2 Let2
## 1 25 Edgemont 52 Sioux County
## 2 57 Burke 88 Papillion-LaVista South