import pandas as pd
import re
from collections import defaultdict
d = defaultdict(list)
df = pd.read_csv('https://raw.githubusercontent.com/twittergithub/hello/main/category_app_id_text_1_month_march_2021%20(1).csv')
and the output for the dataframe is ..
suggestions category
0 ['jio tv', 'jio', 'jiosaavn', 'jiomart', 'jio ... ['BOOKS_AND_REFERENCE',
'PRODUCTIVITY', 'MUSIC...
1 ['instagram', 'internet', 'instacart', 'instag... ['SOCIAL', 'COMMUNICATION',
'FOOD_AND_DRINK', ...
2 ['instagram', 'instacart', 'instagram download... ['SOCIAL', 'FOOD_AND_DRINK',
'VIDEO_PLAYERS', ...
3 ['vpn', 'vpn free', 'vpn master', 'vpn private... ['TOOLS', 'TOOLS', 'TOOLS', 'TOOLS',
'TOOLS', ...
4 ['pubg', 'pubg mobile lite', 'pubg lite', 'pub... ['GAME_ACTION', 'GAME_ACTION',
'TOOLS', 'GAME_...
... ...
...
49610 ['inbuilt camera app', 'inbuilt screen recorde... ['PHOTOGRAPHY', 'VIDEO_PLAYERS',
'TOOLS', 'PRO...
49611 ['mpsc science app in marathi', 'mpsc science ... ['EDUCATION', 'EDUCATION',
'EDUCATION', 'EDUCA...
49612 ['ryerson', 'ryerson university', 'ryerson mob... ['BOOKS_AND_REFERENCE', 'EDUCATION',
'EDUCATIO...
49613 ['eeze', 'eezee english', 'ezee tab', 'deezer'... ['TRAVEL_AND_LOCAL', 'EDUCATION',
'BUSINESS', ...
49614 ['hindi love story books free download', 'hind... ['BOOKS_AND_REFERENCE',
'BOOKS_AND_REFERENCE',...
If want to create a dictionary of category columns for each item present in the list of category in each row and inside each category create a dictionary of suggestions from suggestions columns and if suggestions or categories are repeating, then just increment the counter inside the dictionary.
dictionary = defaultdict(list)
for i in range(df.shape[0]):
categories = set(re.sub(r'[^\w\s]', '', df.loc[i, 'category']).split())
for category in categories:
suggestions = set(re.sub(r'[^\w\s]', '', df.loc[i, 'suggestions']).split())
for suggestion in suggestions:
if suggestion not in dictionary[category]:
dictionary[category][suggestion] = 1
else:
dictionary[category][suggestion] += 1
but I am getting empty list inside list of category inside defaultdict. I hope that you understand my question.
It's probably a bit easier and faster to do with pandas
:
from ast import literal_eval
# create cartesian product of categories and suggestions for each record,
# and calculate value_counts
z = pd.merge(
df['category'].apply(literal_eval).explode(),
df['suggestions'].apply(literal_eval).explode(),
left_index=True,
right_index=True).value_counts()
# convert to nested dict
d = {l: z.xs(l).to_dict() for l in z.index.levels[0]}
d
Output:
{'ART_AND_DESIGN': {'flipaclip': 39,
'mehndi design': 28,
'ibis paint x': 22,
'u launcher lite': 21,
'poster maker': 20,
'poster maker design app free': 20,
'ibis paint': 18,
'mehndi design 2021': 18,
'mehandi ka design': 18,
'u launcher': 18,
...
Having said this, if you want to go with the original approach, all you need to fix is to declare the dictionary
as defaultdict(dict)
instead of defaultdict(list)
, because you're making a nested dictionary, not a dictionary of lists:
dictionary = defaultdict(dict)
for i in range(df.shape[0]):
...