python pandas group-by count pandas-groupby

Pandas count how many users connected to each computer

I have a dataset of logons. I want to count how many users connected to each computers using pandas builtin functions only. I need the result dataset to be the same size as the original, so for each time 1 computer appears in the original table it will apear in the result table with the same number of logons:

So if this is the original table:

Computer	User
computer1	user1
computer1	user2
computer1	user3
computer2	user1
computer2	user1
computer3	user1
computer3	user2
computer3	user2

I want the result table to be like this:

Computer	User_Count
computer1	3
computer1	3
computer1	3
computer2	1
computer2	1
computer3	2
computer3	2
computer3	2

With simple lists it works for me:

result = []
num_of_computers = {}
for user in set(user_and_computer):
    computers = []
    for logon in user_and_computers:
        if user == logon[0]:
            computer.append(logon[1])
        num_of_computers[user] = len(computers)
for user in user_and_computer:
    result.append(num_of_computers[user[0]]

Also, I have tried to count is with a condition on a third column (Fail or Success), to count only successful logons:

result = []
num_of_computers = {}
for user in set(user_and_computer):
    computers = []
    for logon in user_and_computers:
        if user == logon[0] and logon[2] == 'Success':
            computer.append(logon[1])
        num_of_computers[user] = len(computers)
for user in user_and_computer:
    result.append(num_of_computers[user[0]]

In this case the result table is still the same size as the original table, and it only count successful logons. If there is a computer that all the logons to it were failed the result table will show this computer with 0 for each time this computer appears in the original table.

And one more thing, I am new to pandas, dataframes and tables, and I would like to know how do you describe tasks like this without using examples, like, how should I name my question to make it more general.

Solution

Use GroupBy.transform with DataFrameGroupBy.nunique, for count only Success rows repalce not matched User to missing values by Series.where:

print (df)
    Computer   User     Type
0  computer1  user1     Fail
1  computer1  user2  Success
2  computer1  user3     Fail
3  computer2  user1  Success
4  computer2  user1     Fail
5  computer3  user1  Success
6  computer3  user2     Fail
7  computer3  user2  Success


df['User_Count'] = df.groupby('Computer')['User'].transform('nunique')

df['User_Count_Success'] = (df['User'].where(df['Type'].eq('Success'))
                                      .groupby(df['Computer'])
                                      .transform('nunique'))
print (df)
    Computer   User     Type  User_Count  User_Count_Success
0  computer1  user1     Fail           3                   1
1  computer1  user2  Success           3                   1
2  computer1  user3     Fail           3                   1
3  computer2  user1  Success           1                   1
4  computer2  user1     Fail           1                   1
5  computer3  user1  Success           2                   2
6  computer3  user2     Fail           2                   2
7  computer3  user2  Success           2                   2

Details:

print (df['User'].where(df['Type'].eq('Success')))
0      NaN
1    user2
2      NaN
3    user1
4      NaN
5    user1
6      NaN
7    user2
Name: User, dtype: object