Search code examples
pythonpandaspython-re

For loop to match names from lists: Why won't my values match?


I am trying to append matched names between a list (clients) and a dataframe (names). Unfortunately, I keep getting an error and am unsure what I am doing incorrectly. It references my line with the 'search' variable but I am having trouble understanding why it is saying the 'the Lengths must match'. When I have created a variable such as this in the past for similar purposes I have not gotten this error. Additionally, I tried a couple of modifications from my interpretation of the regex documentation and similar web results but they did not work out.

My code:

## example data
clients = [['Example Name'],['Example Name2']]
name_list = [['Example Name'],['Example Name1'],['Example Name2']]
names = pd.DataFrame(data=name_list,columns=['name'])

## code
matches = []
for client in clients:
    search = str(names.loc[names['name']==client,'name'].iloc[0])
    client_ = str(client)
    
    if re.search(client_,search,flags=re.IGNORECASE).group(0)== client_:
        matches.append(client)
    else:
        continue
print(matches)

Error Outputs:

ValueError                                Traceback (most recent call last)
Cell In[29], line 7
      5 matches = []
      6 for client in clients:
----> 7     search = str(names.loc[names['name']==client,'name'].iloc[0])
      8     client_ = str(client)
     10     if re.search(client_,search,flags=re.IGNORECASE).group(0)== client_:

ValueError: ('Lengths must match to compare', (3,), (1,))

UPDATE FOR CLARIFICATION: Thank you for the help, Raphael.

So I used example strings (ex. ['Example Name'] in name_list)for the dataframe for the original, below is the dataframe I want to search.

search_df print-out:

       Assessment Year      County  \
0                 2020  Atlantic     
1                 2022  Atlantic     
2                 2016  Atlantic     
3                 2016  Atlantic     
4                 2017  Atlantic     
      

                                              defendants  \
0                                           ABSECON CITY   
1                                                ABSECON   
2                                          ATLANTIC CITY   
3                                          ATLANTIC CITY   
4      CITY OF ATLANTIC CITY, A MUNICIPAL CORPORATION...   
   

                                              plaintiffs  
0                                        SSN ABSECON LLC  
1                                           RATAN AC LLC  
2                                              MAC CORP.  
3                                    GRAND PRIX ATLANTIC  
4      MAC CORP., A CORPORATION OF THE STATE OF NEW J...  
 

I changed my approach since the original question from using a dataframe to just converting the column of interest to a list. I also switched from re.search() to using the in operand as it cannot read the list objects nor did it return a match when I concatenated the list to a single string:

search_list = []
for plaintiff in search_df['plaintiffs']:
    search_list.append([plaintiff])

the client_list has more names but for now know this value is within: SSN ABSECON, LLC

So for example, when I perform this

for client in client_list:
    client_ = str(client).upper()
    print(client_)
    print(client_ in search_list)
print(search_list)

I receive this output:

['SSN ABSECON LLC']
False
...# I removed the other falses for brievity
...
[['SSN ABSECON LLC'],...etc.],

Which is confusing me because I made the appropriate case format, spacing, and string character modification to the search_list's should-be match in the client_list and it still is failing to return True. Let me know if you see what steps I am failing to do or if there is a better way.


Solution

  • So I had some time to play around and I think i got it working.
    I changed the clients from clients = [['SSN ABSECON LLC'], ["MAC CORP."]]
    to clients = ['SSN ABSECON LLC', "MAC CORP."] and converted the dataframe column to a list.

    import pandas as pd
    import re
    
    names = {"Assessment Year":[2020,2022,2016,2016,2017],"County":["Atlantic","Atlantic","Atlantic","Atlantic","Atlantic"],
            "defendants":["ABSECON CITY","ABSECON",  "ATLANTIC CITY","ATLANTIC CITY","CITY OF ATLANTIC CITY, A MUNICIPAL CORPORATION "],
            "plaintiffs":["SSN ABSECON LLC","RATAN AC LLC","MAC CORP.","GRAND PRIX ATLANTIC","MAC CORP., A CORPORATION OF THE STATE OF NEW J..."]
    }
    clients = ['SSN ABSECON LLC', "MAC CORP."]
    names = pd.DataFrame.from_dict(names)
    
    ## code
    matches = []
    for client in clients:
    
        client_ = str(client).upper()
        print(client_)
        print(client_ in list(names["plaintiffs"]))
    

    This prints:

    SSN ABSECON LLC
    True
    MAC CORP.
    True
    

    I hope this does what you wanted.