I have a dataframe with 7 columns, as follows:
Bank Name | Number | Firstname | Lastname | ID | Date1 | Date2
B1 | 1 | ABC | EFG | 12 | Somedate | Somedate
B2 | 2 | ABC | EFG | 12 | Somedate | Somedate
B1 | 1 | DEF | EFG | 12 | Somedate | Somedate
B3 | 3 | ABC | GHI | 13 | Somedate | Somedate
B4 | 4 | XYZ | FHJ | 13 | Somedate | Somedate
B5 | 5 | XYZ | DFI | 13 | Somedate | Somedate
I want to create a tuple with 4 elements for each ID
, such that each tuple element represents (Bank Name, Number, Firstname, Lastname)
for each ID
and the values of these tuple elements is the count of the distinct elements present in that respective column for that ID.
For eg: For ID = 12
, the tuple should be (2, 2, 2, 1)
and for ID=13,
the tuple should be (3, 3, 2, 3)
I'm able to get all rows for a particular ID by doing the following:
print(df.loc[df['ID'] == '12'])
But, I do not know how to do this for each value in the ID column (much like the group by clause in SQL, and also get the count instead of the actual values in the rows.
Please help.
Using apply
you could do
In [117]: cols = ['BankName', 'Number', 'Firstname', 'Lastname']
In [126]: df.groupby('ID')[cols].nunique().apply(tuple, axis=1)
Out[126]:
ID
12 (2, 2, 2, 1)
13 (3, 3, 2, 3)
dtype: object
or,
In [127]: df.groupby('ID').apply(lambda x: tuple(x[c].nunique() for c in cols))
Out[127]:
ID
12 (2, 2, 2, 1)
13 (3, 3, 2, 3)
dtype: object
Or, if you want as dataframe instead of tuple
In [122]: df.groupby('ID').agg({c: 'nunique' for c in cols})
Out[122]:
Lastname Number Firstname BankName
ID
12 1 2 2 2
13 3 3 2 3
or,
In [123]: df.groupby('ID')[cols].nunique()
Out[123]:
BankName Number Firstname Lastname
ID
12 2 2 2 1
13 3 3 2 3