Search code examples
juliajulia-dataframe

How to add suffix or prefix for duplicate columns in julia?


I have a two df and both dfs have some common columns which are not included in on list. If I add makeunique parameter it creates new column with suffix of _n where. Is there anyway I can pass prefix values such as ['_left', '_right'] to the result df? In pandas I can pass some argument lsuffix and rsuffix.

Sample Input:

Df1:

│ Row │ ID    │ Name    │
│     │ Int64 │ String  │
├─────┼───────┼─────────┤
│ 1   │ 1     │ Mohamed │
│ 2   │ 2     │ Thasin  │

Df2:

│ Row │ ID    │ Job    │ Name   │
│     │ Int64 │ String │ String │
├─────┼───────┼────────┼────────┤
│ 1   │ 1     │ Tech   │ Md     │
│ 2   │ 2     │ Tech   │ Tn     │
│ 3   │ 3     │ Assist │ Rj     │
│ 4   │ 4     │ Test   │ Mi     │

inner join result:

innerjoin(people, jobs, on = :ID,  makeunique=true)
│ Row │ ID    │ Name    │ Job    │ Name_1  │
│     │ Int64 │ String  │ String │ String  │
├─────┼───────┼─────────┼────────┼─────────┤
│ 1   │ 1     │ Mohamed │ Tech   │ Md      │
│ 2   │ 2     │ Thasin  │ Tech   │ Tn      │

Expected Output:

| Row │ ID    │ Name_left│ Job    │ Name_right  │
│     │ Int64 │ String  │ String │ String  │
├─────┼───────┼─────────┼────────┼─────────┤
│ 1   │ 1     │ Mohamed │ Tech   │ Md      │
│ 2   │ 2     │ Thasin  │ Tech   │ Tn      │ 

Solution

  • This is not implemented yet. You can expect that it will be added this year. See https://github.com/JuliaData/DataFrames.jl/issues/1333.

    What you can do for the time being is:

    innerjoin(rename!(s -> s == "ID" ? "ID" : s*"_left", DataFrame!(people)),
              rename!(s -> s == "ID" ? "ID" : s*"_right", DataFrame!(jobs)),
              on = :ID)
    

    If you do not care about efficiency and want a bit shorter code use:

    innerjoin(rename(s -> s == "ID" ? "ID" : s*"_left", people),
              rename(s -> s == "ID" ? "ID" : s*"_right", jobs),
              on = :ID)