I have a two df and both dfs have some common columns which are not included in on
list. If I add makeunique
parameter it creates new column with suffix of _n where. Is there anyway I can pass prefix values such as ['_left', '_right'] to the result df?
In pandas I can pass some argument lsuffix
and rsuffix
.
Sample Input:
Df1:
│ Row │ ID │ Name │
│ │ Int64 │ String │
├─────┼───────┼─────────┤
│ 1 │ 1 │ Mohamed │
│ 2 │ 2 │ Thasin │
Df2:
│ Row │ ID │ Job │ Name │
│ │ Int64 │ String │ String │
├─────┼───────┼────────┼────────┤
│ 1 │ 1 │ Tech │ Md │
│ 2 │ 2 │ Tech │ Tn │
│ 3 │ 3 │ Assist │ Rj │
│ 4 │ 4 │ Test │ Mi │
inner join result:
innerjoin(people, jobs, on = :ID, makeunique=true)
│ Row │ ID │ Name │ Job │ Name_1 │
│ │ Int64 │ String │ String │ String │
├─────┼───────┼─────────┼────────┼─────────┤
│ 1 │ 1 │ Mohamed │ Tech │ Md │
│ 2 │ 2 │ Thasin │ Tech │ Tn │
Expected Output:
| Row │ ID │ Name_left│ Job │ Name_right │
│ │ Int64 │ String │ String │ String │
├─────┼───────┼─────────┼────────┼─────────┤
│ 1 │ 1 │ Mohamed │ Tech │ Md │
│ 2 │ 2 │ Thasin │ Tech │ Tn │
This is not implemented yet. You can expect that it will be added this year. See https://github.com/JuliaData/DataFrames.jl/issues/1333.
What you can do for the time being is:
innerjoin(rename!(s -> s == "ID" ? "ID" : s*"_left", DataFrame!(people)),
rename!(s -> s == "ID" ? "ID" : s*"_right", DataFrame!(jobs)),
on = :ID)
If you do not care about efficiency and want a bit shorter code use:
innerjoin(rename(s -> s == "ID" ? "ID" : s*"_left", people),
rename(s -> s == "ID" ? "ID" : s*"_right", jobs),
on = :ID)