I have a dataset that I have tried to give a sample of using the dput command below. The problem I'm running into is trying to separate out the data by delimiter.
> dput(head(team_data))
structure(list(X1 = 2:6,
names2 = c("Andre Callender Seton Hall Preparatory School (West Orange, NJ)", "Gosder Cherilus Somerville (Somerville, MA)", "Justin Bell Mount Vernon (Alexandria, VA)", "Tom Anevski Elder (Cincinnati, OH)", "Brad Mueller Mars Area (Mars, PA)"),
pos2 = c("RB 5-10 185", "OT 6-7 270", "TE 6-3 250", "OT 6-5 265", "CB 6-0 170"), rating2 = c("0.8667 194 18 8", "0.8667 262 20 1", "0.8333 306 14 7", "0.8333 377 25 13", "0.8333 496 36 16"),
status2 = c("Enrolled 6/30/2003", "Enrolled 6/30/2003", "Enrolled 6/30/2003", "Enrolled 6/30/2003", "Enrolled 6/30/2003"), team = c("Boston-College", "Boston-College", "Boston-College", "Boston-College", "Boston-College"), year = c(2003L, 2003L, 2003L, 2003L, 2003L)),
.Names = c("X1", "names2", "pos2", "rating2", "status2", "team", "year"), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
The following is the code I am trying to execute on the above dataset. The following two functions work fine and as expected as far as I can tell.
library(rvest)
library(stringr)
library(tidyr)
library(readxl)
df2<-separate(data=team_data,col=pos2,into= c("Position","Height","Weight"),sep=" ")
df3<-separate(data=df2,col=rating2,into= c("Rating","National","Position","State Rank"),sep=" ")
But then I have significant trouble trying to further separate out the columns of the dataframe. I have tried various ways (examples below) but all of the pieces of code below produce the same error, "Error: Data source must be a dictionary".
df4<-separate(data=df3,col=names2,into= c("Name","Geo"),sep="(")
df4<-separate(data=df3,col=names2,into= c("Name","Geo"),sep='\\(|\\)')
df4<-separate(data=df3,col=status2,into= c("Date_Enrollment","Enroll_Status"),sep=" ")
df4<-separate(data=df3,col=status2,into= c("Date_Enrollment","Enroll_Status"),sep=" ")
The ultimate goal would be to separate out the "names2" column at the "(" and the "," and remove the ")" so that I would end up with 3 columns of data. For the other column ("status2") the goal would be to separate out the "Enrolled" from the date of enrollment.
From what I have read the error I'm getting indicates that I am duplicating column names, but I can't figure out where that is happening.
You are using Position
twice, once in df2
and once in df3
. This works for me:
team_data %>%
separate(col=pos2, into= c("Position","Height","Weight"), sep=" ") %>%
separate(col=rating2,into= c("Rating","National","Position2","State Rank"),sep=" ")%>%
separate(col=names2,into= c("Name","Geo"),sep="\\(") %>%
separate(col=status2,into= c("Date_Enrollment","Enroll_Status"),sep=" ")