Search code examples
pythonscikit-learnnlpsimilarity

TypeError trying to compare texts using for loop


I am trying to compare texts scraped from different websites to each other. I have a list of text got from a column in a dataframe. To compare texts in this list, I have tried to use similarity (I do not know if there is another way to do the same). This is the code:

from difflib import SequenceMatcher

titles = filtered_dataset['Titles'].tolist()

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

def get_jaccard_sim(str1, str2): 
    a = set(str1.split()) 
    b = set(str2.split())
    c = a.intersection(b)
    return float(len(c)) / (len(a) + len(b) - len(c))

similarities=[]
j_similarities=[]
for title in titles:
    similarity=similar(title, title+1)
    jacc_similarity=get_jaccard_sim(title,  title+1) # I would like to compare the first text to the others; then the second one, and so on... 

I have got the following error:

TypeError: can only concatenate str (not "int") to str

because of

similarity=similar(title, title+1)
jacc_similarity=get_jaccard_sim(title,  title+1)

Could you please help me to fix the error to compare the texts?


Solution

  • You adding title (String) and 1 (int) but in python you cannot add string and integer if you wanna add a string to an integer change that integer to a string. ex: "sampleString"+str(1) = "sampleString1" , str() function changes 1 to '1'. so here type("sampleString") is string and type(str(1)) is string. so you can add both strings together.

    use this code

    similarity=similar(title, title+str(1))
    jacc_similarity=get_jaccard_sim(title,  title+str(1))
    

    thank you.