There can be a lot of insignificant edge cases and data noise. I want to get a pie chart (based on Bokeh or any other open source, free plot library) that would allow to see data like this:
type size
S 1
V 2
T 200
Z 3333
Reduced to its core, with insignificant (< 1% type size) noise put into new "other" type.
1) Can Pandas do it on its own? How? 2) Does some visualization already come with such feature integrated?
Consider the pandas series a
with counts of values
import pandas as pd
import numpy as np
from string import ascii_uppercase
types = np.random.permutation(list(ascii_uppercase))
r = np.arange(1, 27)
r = r / r.sum()
s = np.random.choice(types, 10000, p=r)
a = pd.value_counts(s)
Now group all groups with representation less than 3% into one group other
n = a / a.sum()
f = n < .03
a[~f].append(pd.Series(a[f].sum(), ['other'])).plot.pie(colormap='jet')