I've asked a series of questions.
Separate variable in field by character.
Which I think contained multiple questions around the same topic.
I've had excellent answers on how to use separate_rows and then a great answer on how to separate the first and last authors from a character vector.
What I'd like to know now is the final bit:
In this answer Splitting column by separator from right to left in R
the number of columns is known. How do I say "split this string at commas, and throw them into an unknown number of columns based on the number of names in the author list to the right of the original field"?
So that each author becomes a value in a separate field. Initially, I thought it would be cast/spread. BUT!
While this is the example I've used: Author
Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P.
in many cases the number of authors on the paper (not including et. al) will be >1 and could be somewhere around 30 at the most.
So. Final question on this three part saga... How do I separate out all authors to a new field and if I could title the new fields something like First author, second author, etc and so on up to Last Author.
Is that sensible/clear?
appreciate there's two or three people who are helping very quickly.
You can split your author column into a list with str_split
and then use unnest
to get long format dataframe with a new author on each line. Then you use spread
and an ID column to get the data into wide format.
library(dplyr)
library(tidyr)
df <- data.frame(publication = c("pub1","pub2","pub3"),author = c("Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P","test author","test arthur, another author"))
df
# publication author
#1 pub1 Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P
#2 pub2 test author
#3 pub3 test arthur, another author
df %>% group_by(publication) %>% mutate(author = str_split(author,", ")) %>% unnest %>% mutate(ID = paste0("author_",row_number())) %>% spread(ID,author)
# A tibble: 3 x 6
# Groups: publication [3]
# publication author_1 author_2 author_3 author_4 author_5
# <fct> <chr> <chr> <chr> <chr> <chr>
#1 pub1 Drijgers RL Verhey FR Leentjens AF Kahler S Aalten P
#2 pub2 test author NA NA NA NA
#3 pub3 test arthur another author NA NA NA