I have a DataFrame for car makes and types.
I have used this:
conditional_p = pd.crosstab(cars_selected_df.type, cars_selected_df.make, margins=True, normalize='columns')
which gives me the conditional probability calculation as I want. However, I am having trouble being able to print the conditional probability % after using pd.crosstab.
If I print conditional_p, it appears like this:
make alfa-romero audi bmw ... volkswagen volvo All
type ...
standard 1.0 0.714286 1.0 ... 0.833333 0.545455 0.819512
turbo 0.0 0.285714 0.0 ... 0.166667 0.454545 0.180488
I want my output to be printed to appear like this:
Prob(type=standard | make=alfa-romero) = 100 %
Prob(type=turbo | make=alfa-romero) = 0 %
Prob(type=standard | make=audi) = 71.43 %
Prob(type=turbo | make=audi) = 28.57 %
...
for all of the makes (there are 20) and types (2 different types) I have. I was thinking I could use a lamda function to do this, however, how do I refer to the conditional probability value that the crosstab solved for? Do I have to use df.stack() to get the crosstab back into a DataFrame and then I can refer to that within my lambda function? I tried, but still am not getting anywhere.
Here was my attempt at that:
y = conditional_p.stack()
cond_probabilities_df = pd.DataFrame({'car_type':cars_df['type'].unique(), 'make_name':cars_df['make'].unique(), 'cond_prob' : y})
print_cond_probability = lambda x: print('Prob(type='+x.car_type+') | make= '+x.make_name+'= '+x.cond_prob+'%')
and I got this error: ValueError: arrays must all be same length
Side note: I am kinda a novice and not using groupby, only pandas. Thanks for your help.
Correct me if I completely misunderstood the question but is something like this, what you're looking for:
for make in conditional_p.columns:
for typ in conditional_p.index:
print(f'Prob(type={typ} | make={make}) = {conditional_p[make][typ] * 100:.2f}')