I would like to know how I could find similarity within the same sentence. I have a list of sentences like these:
my_list=["do you want pizza for dinner? Do you want pizza for dinner?", "I like pizza", "I have no money I have no money"]
I would like to create a pandas dataframe where, if a sentence is repeated within the same, I assign 1, otherwise 0.
Something like this:
Text Repeated?
do you want pizza for dinner? Do you want pizza for dinner? 1
I like pizza 0
I have no money I have no money 1
I was thinking of something like this:
from collections import Counter
my_list = dict(Counter(my_list.split()))
for i in sorted(my_list.keys()):
print ('"'+i+'" is repeated '+str(my_list[i])+' time.')
Then counting how many words there are in total and how many unique words there are in total in that sentence. But I think it would be not good as coding. Do you know if there is another way to get the expected result?
You can use regular expression for the task (regex101):
import re
import pandas as pd
my_list=["do you want pizza for dinner? Do you want pizza for dinner?", "I like pizza", "I have no money I have no money"]
df = pd.DataFrame({'Text': my_list})
r = re.compile(r'(.+)\s*\1$', flags=re.I)
df['Repeated'] = df['Text'].apply(lambda x: bool(r.match(x))).astype(int)
print(df)
Prints:
Text Repeated
0 do you want pizza for dinner? Do you want pizz... 1
1 I like pizza 0
2 I have no money I have no money 1