I have a dataset of logons. I want to count how many users connected to each computers using pandas builtin functions only. I need the result dataset to be the same size as the original, so for each time 1 computer appears in the original table it will apear in the result table with the same number of logons:
So if this is the original table:
Computer | User |
---|---|
computer1 | user1 |
computer1 | user2 |
computer1 | user3 |
computer2 | user1 |
computer2 | user1 |
computer3 | user1 |
computer3 | user2 |
computer3 | user2 |
I want the result table to be like this:
Computer | User_Count |
---|---|
computer1 | 3 |
computer1 | 3 |
computer1 | 3 |
computer2 | 1 |
computer2 | 1 |
computer3 | 2 |
computer3 | 2 |
computer3 | 2 |
With simple lists it works for me:
result = []
num_of_computers = {}
for user in set(user_and_computer):
computers = []
for logon in user_and_computers:
if user == logon[0]:
computer.append(logon[1])
num_of_computers[user] = len(computers)
for user in user_and_computer:
result.append(num_of_computers[user[0]]
Also, I have tried to count is with a condition on a third column (Fail or Success), to count only successful logons:
result = []
num_of_computers = {}
for user in set(user_and_computer):
computers = []
for logon in user_and_computers:
if user == logon[0] and logon[2] == 'Success':
computer.append(logon[1])
num_of_computers[user] = len(computers)
for user in user_and_computer:
result.append(num_of_computers[user[0]]
In this case the result table is still the same size as the original table, and it only count successful logons. If there is a computer that all the logons to it were failed the result table will show this computer with 0 for each time this computer appears in the original table.
And one more thing, I am new to pandas, dataframes and tables, and I would like to know how do you describe tasks like this without using examples, like, how should I name my question to make it more general.
Use GroupBy.transform
with DataFrameGroupBy.nunique
, for count only Success
rows repalce not matched User
to missing values by Series.where
:
print (df)
Computer User Type
0 computer1 user1 Fail
1 computer1 user2 Success
2 computer1 user3 Fail
3 computer2 user1 Success
4 computer2 user1 Fail
5 computer3 user1 Success
6 computer3 user2 Fail
7 computer3 user2 Success
df['User_Count'] = df.groupby('Computer')['User'].transform('nunique')
df['User_Count_Success'] = (df['User'].where(df['Type'].eq('Success'))
.groupby(df['Computer'])
.transform('nunique'))
print (df)
Computer User Type User_Count User_Count_Success
0 computer1 user1 Fail 3 1
1 computer1 user2 Success 3 1
2 computer1 user3 Fail 3 1
3 computer2 user1 Success 1 1
4 computer2 user1 Fail 1 1
5 computer3 user1 Success 2 2
6 computer3 user2 Fail 2 2
7 computer3 user2 Success 2 2
Details:
print (df['User'].where(df['Type'].eq('Success')))
0 NaN
1 user2
2 NaN
3 user1
4 NaN
5 user1
6 NaN
7 user2
Name: User, dtype: object