Search code examples
pythonpandasgroup-bycountpandas-groupby

Pandas count how many users connected to each computer


I have a dataset of logons. I want to count how many users connected to each computers using pandas builtin functions only. I need the result dataset to be the same size as the original, so for each time 1 computer appears in the original table it will apear in the result table with the same number of logons:

So if this is the original table:

Computer User
computer1 user1
computer1 user2
computer1 user3
computer2 user1
computer2 user1
computer3 user1
computer3 user2
computer3 user2

I want the result table to be like this:

Computer User_Count
computer1 3
computer1 3
computer1 3
computer2 1
computer2 1
computer3 2
computer3 2
computer3 2

With simple lists it works for me:

result = []
num_of_computers = {}
for user in set(user_and_computer):
    computers = []
    for logon in user_and_computers:
        if user == logon[0]:
            computer.append(logon[1])
        num_of_computers[user] = len(computers)
for user in user_and_computer:
    result.append(num_of_computers[user[0]]

Also, I have tried to count is with a condition on a third column (Fail or Success), to count only successful logons:

result = []
num_of_computers = {}
for user in set(user_and_computer):
    computers = []
    for logon in user_and_computers:
        if user == logon[0] and logon[2] == 'Success':
            computer.append(logon[1])
        num_of_computers[user] = len(computers)
for user in user_and_computer:
    result.append(num_of_computers[user[0]]

In this case the result table is still the same size as the original table, and it only count successful logons. If there is a computer that all the logons to it were failed the result table will show this computer with 0 for each time this computer appears in the original table.

And one more thing, I am new to pandas, dataframes and tables, and I would like to know how do you describe tasks like this without using examples, like, how should I name my question to make it more general.


Solution

  • Use GroupBy.transform with DataFrameGroupBy.nunique, for count only Success rows repalce not matched User to missing values by Series.where:

    print (df)
        Computer   User     Type
    0  computer1  user1     Fail
    1  computer1  user2  Success
    2  computer1  user3     Fail
    3  computer2  user1  Success
    4  computer2  user1     Fail
    5  computer3  user1  Success
    6  computer3  user2     Fail
    7  computer3  user2  Success
    
    
    df['User_Count'] = df.groupby('Computer')['User'].transform('nunique')
    
    df['User_Count_Success'] = (df['User'].where(df['Type'].eq('Success'))
                                          .groupby(df['Computer'])
                                          .transform('nunique'))
    print (df)
        Computer   User     Type  User_Count  User_Count_Success
    0  computer1  user1     Fail           3                   1
    1  computer1  user2  Success           3                   1
    2  computer1  user3     Fail           3                   1
    3  computer2  user1  Success           1                   1
    4  computer2  user1     Fail           1                   1
    5  computer3  user1  Success           2                   2
    6  computer3  user2     Fail           2                   2
    7  computer3  user2  Success           2                   2
    

    Details:

    print (df['User'].where(df['Type'].eq('Success')))
    0      NaN
    1    user2
    2      NaN
    3    user1
    4      NaN
    5    user1
    6      NaN
    7    user2
    Name: User, dtype: object