I am trying to append matched names between a list (clients) and a dataframe (names). Unfortunately, I keep getting an error and am unsure what I am doing incorrectly. It references my line with the 'search' variable but I am having trouble understanding why it is saying the 'the Lengths must match'. When I have created a variable such as this in the past for similar purposes I have not gotten this error. Additionally, I tried a couple of modifications from my interpretation of the regex documentation and similar web results but they did not work out.
My code:
## example data
clients = [['Example Name'],['Example Name2']]
name_list = [['Example Name'],['Example Name1'],['Example Name2']]
names = pd.DataFrame(data=name_list,columns=['name'])
## code
matches = []
for client in clients:
search = str(names.loc[names['name']==client,'name'].iloc[0])
client_ = str(client)
if re.search(client_,search,flags=re.IGNORECASE).group(0)== client_:
matches.append(client)
else:
continue
print(matches)
Error Outputs:
ValueError Traceback (most recent call last)
Cell In[29], line 7
5 matches = []
6 for client in clients:
----> 7 search = str(names.loc[names['name']==client,'name'].iloc[0])
8 client_ = str(client)
10 if re.search(client_,search,flags=re.IGNORECASE).group(0)== client_:
ValueError: ('Lengths must match to compare', (3,), (1,))
UPDATE FOR CLARIFICATION: Thank you for the help, Raphael.
So I used example strings (ex. ['Example Name'] in name_list)for the dataframe for the original, below is the dataframe I want to search.
search_df print-out:
Assessment Year County \
0 2020 Atlantic
1 2022 Atlantic
2 2016 Atlantic
3 2016 Atlantic
4 2017 Atlantic
defendants \
0 ABSECON CITY
1 ABSECON
2 ATLANTIC CITY
3 ATLANTIC CITY
4 CITY OF ATLANTIC CITY, A MUNICIPAL CORPORATION...
plaintiffs
0 SSN ABSECON LLC
1 RATAN AC LLC
2 MAC CORP.
3 GRAND PRIX ATLANTIC
4 MAC CORP., A CORPORATION OF THE STATE OF NEW J...
I changed my approach since the original question from using a dataframe to just converting the column of interest to a list. I also switched from re.search()
to using the in
operand as it cannot read the list objects nor did it return a match when I concatenated the list to a single string:
search_list = []
for plaintiff in search_df['plaintiffs']:
search_list.append([plaintiff])
the client_list
has more names but for now know this value is within:
SSN ABSECON, LLC
So for example, when I perform this
for client in client_list:
client_ = str(client).upper()
print(client_)
print(client_ in search_list)
print(search_list)
I receive this output:
['SSN ABSECON LLC']
False
...# I removed the other falses for brievity
...
[['SSN ABSECON LLC'],...etc.],
Which is confusing me because I made the appropriate case format, spacing, and string character modification to the search_list's should-be match in the client_list and it still is failing to return True
. Let me know if you see what steps I am failing to do or if there is a better way.
So I had some time to play around and I think i got it working.
I changed the clients from clients = [['SSN ABSECON LLC'], ["MAC CORP."]]
to clients = ['SSN ABSECON LLC', "MAC CORP."]
and converted the dataframe column to a list.
import pandas as pd
import re
names = {"Assessment Year":[2020,2022,2016,2016,2017],"County":["Atlantic","Atlantic","Atlantic","Atlantic","Atlantic"],
"defendants":["ABSECON CITY","ABSECON", "ATLANTIC CITY","ATLANTIC CITY","CITY OF ATLANTIC CITY, A MUNICIPAL CORPORATION "],
"plaintiffs":["SSN ABSECON LLC","RATAN AC LLC","MAC CORP.","GRAND PRIX ATLANTIC","MAC CORP., A CORPORATION OF THE STATE OF NEW J..."]
}
clients = ['SSN ABSECON LLC', "MAC CORP."]
names = pd.DataFrame.from_dict(names)
## code
matches = []
for client in clients:
client_ = str(client).upper()
print(client_)
print(client_ in list(names["plaintiffs"]))
This prints:
SSN ABSECON LLC
True
MAC CORP.
True
I hope this does what you wanted.