I have a sorted Multi-Index pandas data frame, which I need to plot in a bar chart. My data frame.
I either didn't find the solution yet, or the simple one doesn't exist, but I need to plot a bar chart on this data with Content
and Category
to be on x-axis and Installs
to be the height.
In simple terms, I need to show what each bar consist of e.g. 20% of it would be by Everyone
, 40% by Teen
etc... I'm not sure that is even possible, as the mean of means wouldn't be possible, as different sample size, hence I made an Uploads
column to calculate it, but haven't gotten that far to plot by mean.
I think plotting by cumulative would give a wrong result though.
I need to plot a bar chart with X-ticks to be the Category
, (Preferably just the first 10) then each X-tick have a bar of Content
not always 3, could be just "Everyone" and "Teen" and the height of each bar to be Installs
.
Ideally, it should look like so: Bar Chart
but each bar have bars for Content
for this specific Category
.
I have tried flattening out with DataFrame.unstack()
, but it ruins the sorting of the data frame, so used that Cat2 = Cat1.reset_index(level = [0,1])
, but need help with plotting still.
So far I have:
Cat = Popular.groupby(["Category","Content"]).agg({"Installs": "sum", "Rating Count": "sum"})
Uploads = Popular[["Category","Content"]].value_counts().rename_axis(["Category","Content"]).reset_index(name = "Uploads")
Cat = pd.merge(Cat, Uploads, on = ["Category","Content"])
Cat = Cat.groupby(["Category","Content"]).agg({"Installs": "sum", "Rating Count": "sum", "Uploads": "sum"})
which gives this
Then I sort it like so
Cat1 = Cat.unstack()
Cat1 = Cat1.sort_index(key = (Cat1["Installs"].sum(axis = 1)/Cat1["Uploads"].sum(axis = 1)).get, ascending = False).stack()
Thanks to one of those solutions
That's all I have.
Data Set is from Kaggle, over 600MB, don't expect anyone to download it, but at least a simple guide towards a solution.
P.S. This should help me out with splitting each dots in scatter plot below in the same way, but if not, that's fine.
P.S.S I don't have enough reputation to post pictures, so apologies for the links
ChatGPT has answered my question
import pandas as pd
import matplotlib.pyplot as plt
# create a dictionary of data for the DataFrame
data = {
'app_name': ['Google Maps', 'Uber', 'Waze', 'Spotify', 'Pandora'],
'category': ['Navigation', 'Transportation', 'Navigation', 'Music', 'Music'],
'rating': [4.5, 4.0, 4.5, 4.5, 4.0],
'reviews': [1000000, 50000, 100000, 500000, 250000]
}
# create the DataFrame
df = pd.DataFrame(data)
# set the 'app_name' and 'category' columns as the index
df = df.set_index(['app_name', 'category'])
# add a new column called "content_rating" to the DataFrame, and assign a content rating to each app
df['content_rating'] = ['Everyone', 'Teen', 'Everyone', 'Everyone', 'Teen']
# Grouping the Data by category and content_rating and getting the mean of reviews
df_grouped = df.groupby(['category','content_rating']).agg({'reviews':'mean'})
# Reset the index to make it easier to plot
df_grouped = df_grouped.reset_index()
# Plotting the stacked bar chart
df_grouped.pivot(index='category', columns='content_rating', values='reviews').plot(kind='bar', stacked=True)
This is a sample data set
What I did is I added a sum column to the dataset and sorted it by this sum.
piv = qw1.reset_index()
piv = piv.pivot_table(index='Category', columns='Content', values='per')#.plot(kind='bar', stacked = True)
piv["Sum"] = piv.sum(axis=1)
piv_10 = piv.sort_values(by = "Sum", ascending = False)[["Adult", "Everyone", "Mature", "Teen"]].head(10)
where qw1 is the multi-index data frame.
Then all had to do is to plot it:
piv_10.plot.bar(stacked = True, logy = False)