I have a dataset with multiple rows, which is a database of contact details. Some entries are repeated several times, but contain different information. Example:
> example
Title Name Email Phone
[1,] "Chair" "Oswald Gruber" "[email protected]" "+33 12345"
[2,] "Respondent" "Oswald Gruber" NA "+44 54321"
[3,] "Comm.mngr" "Kaspar Villiger" "[email protected]" "+1 123456"
[4,] "Investment mngr" "Markus Urben" "[email protected]" NA
[5,] "Responsible" "Markus Urben" "[email protected]" "+1 33333"
So what I'd like is to have each person mentioned only once, but without losing the info. So, if I merge two first rows in this example, I'd like to retain the email address and both phone numbers, but only one kind of title.
I would appreciate if anybody would suggest any solutions.
And a data.table solution:
library(data.table)
mrg <- function(x) paste(unique(x[!is.na(x)]),collapse=", ")
setDT(example)[,list(Title=head(Title,1), Email=mrg(Email), Phone=mrg(Phone)), by="Name"]
# Name Title Email Phone
# 1: Oswald Gruber Chair [email protected] +33 12345, +44 54321
# 2: Kaspar Villiger Comm.mngr [email protected] +1 123456
# 3: Markus Urben Investment mngr [email protected] +1 33333
This returns the first Title in the natural order, which may or may not be what you want. It also consolidates emails properly and removes NAs.