I have a dataframe that has 4 columns specifying Firstnames and Surnames of 2 People:
Surname Firstname Surname2 Firstname2
1 Wolf Stefan Schmit Paul
2 Schmit Paul Wolf Stefan
3 Schmit Paul Fore Sabine
4 Fore Sabine Schmit Hans
5 Schmit Hans Wolf Stefan
6 Schmit Paul Schmit Hans
7 Bracht Armin Brecht Alwin
8 Brecht Alwin Bracht Armin
Now i want to add an fifth extra column, where the two persons are alphabetically grouped according to the Surname, but if this is the same then according to the Firstname... In the new fifth column should be both Persons, first with Firstname, then with Surname and seperated by a comma f.e:
Surname Firstname Surname2 Firstname2 Team
1 Wolf Stefan Schmit Paul Paul Schmit , Stefan Wolf
2 Schmit Paul Wolf Stefan Paul Schmit , Stefan Wolf
3 Schmit Paul Fore Sabine Sabine Fore , Paul Schmit
4 Fore Sabine Schmit Hans Sabine Fore , Hans Schmit
5 Schmit Hans Wolf Stefan Hans Schmit , Stefan Wolf
6 Schmit Paul Schmit Hans Hans Schmit , Paul Schmit
7 Bracht Armin Brecht Alwin Armin Bracht , Alwin Brecht
8 Brecht Alwin Bracht Armin Armin Bracht , Alwin Brecht
I have a working code based on a for-loop, but i look for a more effective version for bigger dataframes and more comfortable to work with since individual columns for each name would possibly be more than 2...
# Simple Code:
Surname <- c("Wolf", "Schmit", "Schmit", "Fore", "Schmit", "Schmit", "Bracht", "Brecht")
Firstname <- c("Stefan", "Paul", "Paul", "Sabine", "Hans", "Paul", "Armin", "Alwin")
Surname2 <- c("Schmit", "Wolf", "Fore", "Schmit", "Wolf", "Schmit", "Brecht", "Bracht")
Firstname2 <- c("Paul", "Stefan", "Sabine", "Hans", "Stefan", "Hans", "Alwin", "Armin")
library(reshape2)
tester <- melt(data.frame(Surname, Firstname, Surname2, Firstname2))
tester[] <- lapply(tester, as.character)
tester
namescomp <- function(data, i){
if (data[i, "Surname"] < data[i, "Surname2"]){
paste(data[i, "Firstname"], data[i, "Surname"], ", ", data[i, "Firstname2"], data[i, "Surname2"])
} else if (data[i, "Surname"] > data[i, "Surname2"]){
paste(data[i, "Firstname2"], data[i, "Surname2"], ", ", data[i, "Firstname"], data[i, "Surname"])
} else
{ if(data[i, "Firstname"] < data[i, "Firstname2"]){
paste(data[i, "Firstname"], data[i, "Surname"], ", ", data [i, "Firstname2"], data[i, "Surname2"])
} else {
paste(data[i, "Firstname2"], data[i, "Surname2"], ", ", data[i, "Firstname"], data[i, "Surname"])
}
}
}
for(y in 1:nrow(tester)){
i <- y
tester[i, "Team"] <- namescomp(tester, i)
}
tester
A tidyverse
solution:
library(tibble)
library(dplyr)
library(tidyr)
library(stringr)
Surname <- c("Wolf", "Schmit", "Schmit", "Fore", "Schmit", "Schmit", "Bracht", "Brecht")
Firstname <- c("Stefan", "Paul", "Paul", "Sabine", "Hans", "Paul", "Armin", "Alwin")
Surname2 <- c("Schmit", "Wolf", "Fore", "Schmit", "Wolf", "Schmit", "Brecht", "Bracht")
Firstname2 <- c("Paul", "Stefan", "Sabine", "Hans", "Stefan", "Hans", "Alwin", "Armin")
df <- data_frame(Surname, Firstname, Surname2, Firstname2)
df %>%
# create an identifier for each team
rownames_to_column(var = 'team_id') %>%
# split all name components into separate rows
gather(component, value, -team_id) %>%
# extract a person_id from the number behind first/last name. If there's no number there, use 1
mutate(person_id = coalesce(as.numeric(str_extract(component, '[0-9]+$')), 1)) %>%
# remove the number from the first/last name, then pivot the data.frame so that there's a row for every team x person
mutate(component = str_replace(component, '[0-9]+$', '')) %>%
spread(component, value) %>%
# order by team_id (not strictly necessary), then by Surname, then by Firstname (if you want the order reversed, wrap the variable in `desc()`)
arrange(team_id, Surname, Firstname) %>%
# collapse Surname and Firstname into a `full_name` column
unite(full_name, Firstname, Surname, sep = ' ') %>%
# collapse the full names within each team into a single line, separated by commas
group_by(team_id) %>%
summarize(Team = paste(full_name, collapse=', '))
Doesn't produce exactly the output you want but you could join what it does produce back into the original table on rownames.