I am trying to create two separate columns in R. The issue I am running into is that the year does not separate from the previous column depending on the type of observation.
Some names in the data frame are just first names and others have first and last. I am trying to have the Name
display first and first/last and Year
be separate from the current Name
column.
Fake data = Employee and the year they started employment
Create data frame
library(tidyverse)
dat <- tibble(Name = c("Percy Vere (2020)", "Ginger Plant (2017)", "Perry (2019)",
"Pat Thettick (2020)", "Samuel (2022)", "Fay Daway (2008)",
"Greg (2022)", "Simon Sais (2011)"))
# A tibble: 8 x 1
Name
<fct>
1 Percy Vere (2020)
2 Ginger Plant (2017)
3 Perry (2019)
4 Pat Thettick (2020)
5 Samuel (2022)
6 Fay Daway (2008)
7 Greg (2022)
8 Simon Sais (2011)
Attempt separating the column into two: Name
and Year
dat %>%
select_all() %>%
separate(col = Name, into = c('Name', 'Year')) %>% # sep = ',' and ';' does not create a fix
tibble()
# A tibble: 8 x 2
Name Year
<chr> <chr>
1 Percy Vere
2 Ginger Plant
3 Perry 2019
4 Pat Thettick
5 Samuel 2022
6 Fay Daway
7 Greg 2022
8 Simon Sais
Warning message:
Expected 2 pieces. Additional pieces discarded in 8 rows [1, 2, 3, 4, 5, 6, 7, 8].
Use extract
:
dat %>%
extract(Name, c("Name", "Year"), "(.*) .*?(\\d+)")
# A tibble: 8 × 2
Name Year
<chr> <chr>
1 Percy Vere 2020
2 Ginger Plant 2017
3 Perry 2019
4 Pat Thettick 2020
5 Samuel 2022
6 Fay Daway 2008
7 Greg 2022
8 Simon Sais 2011
Useseparate
:
dat %>%
separate(Name, c("Name", "Year"), " \\(|\\)", extra = 'drop')
# A tibble: 8 × 2
Name Year
<chr> <chr>
1 Percy Vere 2020
2 Ginger Plant 2017
3 Perry 2019
4 Pat Thettick 2020
5 Samuel 2022
6 Fay Daway 2008
7 Greg 2022
8 Simon Sais 2011
use str_extract
:
dat %>%
mutate(Year = str_extract(Name, '\\d+'), Name = str_extract(Name, "\\D+(?= )"))
# A tibble: 8 × 2
Name Year
<chr> <chr>
1 Percy Vere 2020
2 Ginger Plant 2017
3 Perry 2019
4 Pat Thettick 2020
5 Samuel 2022
6 Fay Daway 2008
7 Greg 2022
8 Simon Sais 2011
Using str_remove
dat %>%
mutate(Year = str_remove_all(Name, '\\D+'), Name = str_remove(Name, " [(]\\d+[)]"))
# A tibble: 8 × 2
Name Year
<chr> <chr>
1 Percy Vere 2020
2 Ginger Plant 2017
3 Perry 2019
4 Pat Thettick 2020
5 Samuel 2022
6 Fay Daway 2008
7 Greg 2022
8 Simon Sais 2011