I have two df. In the first df, all the recipes, in the second, the ingredients of the recipes
Recipes
RecipeId Name
0 789 Pizza
1 790 Soup
2 791 Sushi
3 792 Cake
4 793 Sandwich
5 794 Bread
6 795 Sausage
Ingredients
id RecipeId Name
0 1 789 Flour
1 2 789 Water
2 795 789 Sausage
3 3 789 Olives
4 4 789 Salt
5 794 793 Bread
6 795 793 Sausage
7 1 794 Flour
8 2 794 Water
9 4 794 Salt
10 5 795 Ground meat
11 4 795 Salt
1.How can I find the recipe with the most ingredients, including all ingredients, such as a sandwich, which has bread (flour, water, salt) and sausages (ground meat and salt) = 5 ingredients? 2.How can I find the recipe with the most nested recipes (not single ingredients) 3.And the ingredient most often used in recipes?
You cannot solve this easily with pandas only. This is a graph problem.
Here is your (directed) graph:
You can use network
to build the graph from your DataFrame, the count the number of terminal descendants per node:
import networkx as nx
df = (recipes.merge(ingredients, on='RecipeId',
suffixes=('_recipe', '_ingredient'))
)
G = nx.from_pandas_edgelist(df, source='Name_recipe', target='Name_ingredient',
create_using=nx.DiGraph)
counts = {n: sum(1 for x in nx.descendants(G, n) if G.out_degree(x)==0)
for n in G.nodes}
output:
{'Pizza': 5,
'Flour': 0,
'Water': 0,
'Sausage': 2,
'Olives': 0,
'Salt': 0,
'Sandwich': 4,
'Bread': 3,
'Ground meat': 0}
If you want to count soup/sushi/cake you can use a left merge and skip the NaN:
import networkx as nx
df = (recipes.merge(ingredients, on='RecipeId',
suffixes=('_recipe', '_ingredient'),
how='left')
.fillna('drop')
)
G = nx.from_pandas_edgelist(df, source='Name_recipe', target='Name_ingredient',
create_using=nx.DiGraph)
G.remove_node('drop')
out = {n: sum(1 for x in nx.descendants(G, n) if G.out_degree(x)==0)
for n in G.nodes}
output:
{'Pizza': 5,
'Flour': 0,
'Water': 0,
'Sausage': 2,
'Olives': 0,
'Salt': 0,
'Soup': 0,
'Sushi': 0,
'Cake': 0,
'Sandwich': 4,
'Bread': 3,
'Ground meat': 0}
pd.Series(out).reindex(recipes['Name'])
output:
Name
Pizza 5
Soup 0
Sushi 0
Cake 0
Sandwich 4
Bread 3
Sausage 2
dtype: int64