I have 2 Dataframes (Salaries and Pitching). The Salaries (where we found the salary of the players) have ~ 26'000 entries and the Pitching (where we found the statistics of the players) have ~ 19'000 entries. Obviously, it is quite more complicated to earn data of some specific statistics than just the salary, so this is comprehensible.
Now i want to add a salary column into the Pitching data frame, but just for the ones which matches (same players, same date). I want to code something like
If players name (from salary dataframe) = players name (from pitching data frame) and year (from salary data frame) = year (from pitching data frame) then insert Salary in pitching data frame, otherwise write N/A
And then in this way, i can remove the rows with N/A’s and have a cleen sheet with the data I want, maybe a bit less observations but a complete one.
I tried something like:
full_join(Salaries,Pitching, by="salary") %>%
mutate(condition = (Salaries$playerID == Pitching$playerID & Salaries$yearID = Pitching$yearID))
CombineDf <- merge(y=Salaries, y=Pitching, by=Pitching$playerID)
joined = cbind(Salaries[match(names(Pitching), Salaries$playerID),], Pitching)
Pitching$Salary <- ifelse(Salaries$playerID >= Pitching$playerID & Salaries$yearID >= Pitching$yearID, Salaries$salary, "N/A")
But sadly nothing did work properly. I would be really glad if you help me solve this problem.
Many thanks
Simply run a ?merge
keeping all x rows (i.e., left join) where unmatched y columns are filled with NA:
combine_df <- merge(x=Pitching, y=Salaries, by=c("playerID", "yearID"), all.x=TRUE)
And since you will remove NA
Salary rows afterwards, run a merge
with full match between both dataframes (i.e. inner join) with no explicit all.x
optional argument.
Alternatively with left_join in dplyr
:
combine_df <- left_join(x=Pitching, y=Salaries, by = c("playerID", "yearID"))
Same reason above to run inner_join
for complete match between both sets by id fields.