I am attempting to get the frequency of objects per version expressed as a percentage.
Input dfsof = pd.read_clipboard()
file version object
path1 1.0 name
path1 1.0 session
path1 1.0 sequence
path2 2.01 name
path2 2.01 session
path2 2.01 sequence
path3 2.01 name
path3 2.01 session
path3 2.01 earthworm
Using the following, I am able to get frequency of each file
.
dfsof.pivot_table(index=['object'], values=['file'], columns=['version'], aggfunc=len, fill_value=0, margins=True)
file
version 1.0 2.01 All
object
earthworm 0 1 1
name 1 2 3
sequence 1 1 2
session 1 2 3
All 3 6 9
I want to divide each count per object/version by the total number of distinct files for that version. Using the expected return table as an example, earthworms shows up in the input only once for version 2.01
, so I expect 0% for version 1.0
and 50% for version 2.01
since only one of the files has that value.
Using dfsof.groupby('version')['file'].nunique()
returns the frequency of files per version, which is the denominator for each of object/version in the table above. What I am struggling with is how to apply the denominator values to the pivot_table. I have seen examples of this using grand totals and subtotals but I can't seem to find nor figure out how to divide by the unique number of files per version. Any help would be greatly appreciated.
version
1.00 1
2.01 2
Expected return
path
version 1.0 2.01 All
object
earthworm 0% 50% 1
name 100% 100% 3
sequence 100% 50% 2
session 100% 100% 3
All 3 6 9
IIUC, you need to aggregate as 'nunique'
, then perform some recomputation:
# aggregate using nunique
out = dfsof.pivot_table(index=['object'], values=['file'], columns=['version'],
aggfunc='nunique', fill_value=0, margins=True)
# compute and save sum of nunique
All = out.iloc[:-1].sum()
# update values as percentage
out.update(out.iloc[:-1, :-1].div(out.iloc[-1, :-1]).mul(100))
# assign sum of nunique
out.loc['All'] = All
print(out)
Output:
file
version 1.0 2.01 All
object
earthworm 0.0 50.0 1
name 100.0 100.0 3
sequence 100.0 50.0 2
session 100.0 100.0 3
All 3.0 6.0 9