Search code examples
pythonpandasdata-cleaning

Match rows by id value in python pandas


I need to match contacts in a database by how they were contacted by a unique ID number. I've created a very small mock dataframe below to help with a suggestion:

data = [
    ["email", "[email protected]", 1],
    ["phone", "555-555-1111", 1],
    ["slack", "secondpersonslack", 2],
    ["phone", "111-111-1111", 3],
]

data2 = [
    [1, "[email protected]", "555-555-1111", "slack1"],
    [2, "[email protected]", "555-555-2222", "secondpersonslack"],
    [3, "[email protected]", "111-111-1111", "tomasslack"],
]

stackdata = pd.DataFrame(
    data, columns=["contact method", "from:", "column that I dont know how to make"]
)
stackdata2 = pd.DataFrame(data2, columns=["id", "email", "phone", "slack"])

In my real dataset, what I want to fill out is the column 'column that I dont know how to make'.

So, take first contact method in stackdata in row one, which was an email address. This email lines up with ID 1 in stack2, so it would populate the column in the stackdata1 with "1".

I was thinking of some of kind of for loop like:

for i in stackdata['column that I dont know how to make']:
  if i matches one of the columns in any row, then populate column with id of that row

Thanks for any help and let me know if i can make this question easier to answer!


Solution

  • Try this:

    for i in range(len(stackdata)):
        stackdata['contact_id'][i] = str(stackdata2[(stackdata2['email'] == stackdata['from:'][i]) | (stackdata2['phone'] == stackdata['from:'][i]) | (stackdata2['slack'] == stackdata['from:'][i])]['id'].values).strip('[]')
    

    'contact_id' is the column 'column that I dont know how to make'