I currently have a lists of captions in the form of a list
print(valid_captions)
-> [' Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino ', ' Chuck Grodin ', ' Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth ', ' Kelly Murro and Tom Murro ', ' Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton ']
I want to create a function that would iterate over each element of the list and create an adjacency listfor each person where I can get a list of unique names of all the folks that appear in the list within the data set. I want to represent this adjacency list as a python dictionary with each name as the key and the list of names they appear with as the values.
So the function would take a single caption and return a dictionary in the form of
name: [other names in caption]}
for each name while removing any titles like Dr
or Mayor
.
As an example I would like this
[Dr .Ron Iervolino, Trish Iervolino, and Mayor.Russ Middleton]
to return
{'Ron Iervolino': ['Trish Iervolino', 'Russ Middleton'],
'Trish Iervolino': ['Ron Iervolino', 'Russ Middleton'],
'Russ Middleton': ['Ron Iervolino', 'Russ Middleton']}
f someone appears in a caption by themselves, return {name: []}. So the caption 'Robb Stark' would return {'Robb Stark': []}
I have a function to remove the titles, but I'm getting the adjacency list all wrong.
def remove_title(names):
removed_list = []
for name in names:
altered_name = re.split('Dr |Mayor ', name)
removed_list+=altered_name
try:
while True:
removed_list.remove('')
except:
pass
return removed_list
The following is my solution to the problem whereby I create a function that takes a caption and returns a dictionary of the form {name: [other names in caption]} for each name.
In the function, I cleaned up the captions using string manipulation functions at the very start to remove the titles like 'Mayor', 'Dr' while also stripping out 'and' from the captions. Then I also used strip() to remove any leading or trailing spaces. I incorporate try and except for any exception handling while removing individual elements of the prospective list and then using for loops for the rest of the process.
def format_caption(caption):
name_list = re.split('Dr |Mayor |and |, ', caption)
name_list = [name.strip() for name in name_list]
name_dict = {}
try:
while True:
name_list.remove('')
except:
pass
for name in name_list:
name_dict.update({name:[]})
for key, name_list_2 in name_dict.items():
for name in name_list:
if name != key:
name_list_2.append(name)
return name_dict
The resulting function gives me the captions in the format I was looking for
list=['Dr .Ron Iervolino, Trish Iervolino, and Mayor.Russ Middleton']
print(format_caption(list))
>{'Ron Iervolino': ['Trish Iervolino', 'Russ Middleton'],
'Trish Iervolino': ['Ron Iervolino', 'Russ Middleton'],
'Russ Middleton': ['Ron Iervolino', 'Russ Middleton']}