From Web of Science I have downloaded 500 articles citations in a textfile. Only the Authors' column (AU) have been read into R. The variable contains Author1 to AuthorN separeted by semicolons:
Anselin, L; Fujita, M; Thisse, JF
I would like to extract Author1, Author2, Author3...AuthorN in different columns. In my file I have up to 10 Authors. In this sample max 7 Authors:
#Sample of Data
data <- c("Anselin, L; Varga, A; Acs, Z",
"Acs, ZJ; Anselin, L; Varga, A",
"Anselin, L",
"Fujita, M; Thisse, JF",
"Turner, RK; van den Bergh, JCJM; Soderqvist, T; Barendregt, A; van der Straaten, J; Maltby, E; van Ierland, EC",
"Talen, E; Anselin, L",
"Irwin, EG; Bockstael, NE",
"Leggett, CG; Bockstael, NE",
"Guimaraes, P; Figueiredo, O; Woodward, D",
"Halpern, Benjamin S.; McLeod, Karen L.; Rosenberg, Andrew A.; Crowder, Larry B.")
I have tried many avenues:
#Method3 - Read table : Not same amount of elements
Meth3 <- read.table(textConnection(data), sep=";", stringsAsFactors=FALSE)
#Method2 - Separate in different column : repeats the Names
Meth2 <- do.call(rbind,
strsplit(gsub(";",
"\\1NONSENSESPLIT\\2NONSENSESPLIT\\3", data),
"NONSENSESPLIT"))
#Method5 - Split row entries, make an identifier and recombine them later : Struggle to recombine
Meth5 <- strsplit(data, ";")
i <- 0
id <- unlist( sapply( Meth5, function(r) rep(i<<-i+1, length(r) ) ) )
x <- unlist(Meth5, recursive = FALSE )
x <- list(do.call(rbind,
strsplit(gsub(";",
"\\1NONSENSESPLIT\\2NONSENSESPLIT\\3", x),
"NONSENSESPLIT")))
require(data.table)
data.table( ID=id, do.call(rbind,x))
#Method6: Identifies first Author :
Meth6 <- gsub("[^a-zA-Z0-9 ]","",strsplit(data,"\\; ")[[1]][[1]])
Any suggestions for organizing and identifying the Authors1...AuthorsN is warmly welcomed.
read.csv
has support for this:
read.csv(text=data,header=FALSE,sep=";")
V1 V2 V3 V4 V5 V6 V7
1 Anselin, L Varga, A Acs, Z
2 Acs, ZJ Anselin, L Varga, A
3 Anselin, L
4 Fujita, M Thisse, JF
5 Turner, RK van den Bergh, JCJM Soderqvist, T Barendregt, A van der Straaten, J Maltby, E van Ierland, EC
6 Talen, E Anselin, L
7 Irwin, EG Bockstael, NE
8 Leggett, CG Bockstael, NE
9 Guimaraes, P Figueiredo, O Woodward, D
10 Halpern, Benjamin S. McLeod, Karen L. Rosenberg, Andrew A. Crowder, Larry B.