I have a data frame that looks like this:
customerid brand
0 A2222242BG84 A
1 A2222255LD3L B
2 A2222255LD3L A
3 A2222263537U A
4 A2222265CE34 C
... ... ...
6679602 A9ZZ86K4VM97 B
6679603 A9ZZ9629MP6E B
6679604 A9ZZ9629MP6E C
6679605 A9ZZ9AB9RN5E A
6679606 A9ZZ9C47PZ8G C
where the brands are A,B
and C
. Many customers are customers in one brand, two brands or all three brands and I want to draw a Venn diagram indicating how customers are shared over all brands. I've managed to correctly write the code to show the different counts, in thousands of units but I struggle to make the Venn diagram show how many percent of the entire customer base that count entails.
Here is my complete code and should be completely reproducible:
import matplotlib.pyplot as plt
import matplotlib_venn as venn
def count_formatter(count, branch_counts):
# Convert count to thousands
count = count / 1000
# Return the count as a string, followed by the percentage
return f'{count:.1f}K ({100 * count / sum(branch_counts.values):.1f}%)'
# Get counts of each branch
branch_counts = df['brand'].value_counts()
# Convert counts to sets
branch_sets = [set(group_data['customerid']) for _, group_data in df.groupby('brand')]
plt.figure(figsize=(10, 10))
# Generate the Venn diagram
venn.venn3(
subsets=branch_sets,
set_labels=['A', 'B', 'C'],
subset_label_formatter=lambda count, branch_counts=branch_counts: count_formatter(count, branch_counts)
)
# Show the plot
plt.show()
The figure that's generated only shows 0.0% on all the instances. I don't see why this is.
It should work if you modify the count_formatter
function slightly. Just multiply the value of count
with 1000 again before calulating the percentage value...
def count_formatter(count, branch_counts):
# Convert count to thousands
count = count / 1000
# Return the count as a string, followed by the percentage
return f'{count:.1f}K ({100 * count*1000 / sum(branch_counts.values):.1f}%)'
... or alternatively convert the count
value on the fly (without storing the new value):
def count_formatter(count, branch_counts):
# Return the count as a string, followed by the percentage
return f'{count/1000:.1f}K ({100 * count / sum(branch_counts.values):.1f}%)'