Search code examples
pythonfunctionduplicatesdata-manipulationstring-concatenation

Using Python to Identify Multiple Rows That Share a Common Value


Below is a table of employee names by their title and department. I am trying to add a new column which should consist of the names of other employees who share the same department and title as the name in the row. I have used a groupby and count function to isolate rows that contain duplicate Department ID and Title combinations, but extracting the names of the employees who share the same Title and concatenating them into a single value has proven trickier with Python. Here is where I am currently:

Department Title Name Count
Sales Salesperson John 3
Sales Salesperson Mary 3
Sales Salesperson David 3
Sales Operations Michael 2
Sales Operations James 2

What I'm going for is below:

Department Title Name Count Duplicate Names
Sales Salesperson John 3 Mary, David
Sales Salesperson Mary 3 John, David
Sales Salesperson David 3 John, Mary
Sales Operations Michael 2 James
Sales Operations James 2 Michael

Essentially trying to create a new column consisting of the names of people who share the title and department with the name in the row. I've looked for a function to help with this but it's a fairly obscure case. It's simple enough to do in Excel, but it's a little cumbersome because certain Department/Title combos feature 20 or more times depending on the month. If you know of a Python functionality or sequence of functions I can use to accomplish this, or even just a point in the right direction, it would help tremendously.

Please let me know if I can provide any further detail. Thank you.


Solution

  • You can try using set.difference:

    m = df.groupby(['Department', 'Title'])['Name'].agg(set)
    
    df['Duplicate Names'] = df.apply(lambda x: ', '.join(m[(x['Department'], x['Title'])] - {x['Name']}), axis=1)
    print(df)
    

    Prints:

      Department        Title     Name  Count Duplicate Names
    0      Sales  Salesperson     John      3     David, Mary
    1      Sales  Salesperson     Mary      3     John, David
    2      Sales  Salesperson    David      3      John, Mary
    3      Sales   Operations  Michael      2           James
    4      Sales   Operations    James      2         Michael