Search code examples
pythonstringstring-matching

To match a word from a pool of sentences using Python


I have two different files File "Sentence" contains a pool of sentences, please find the snapshot below. Sentence Snapshot

File "Word" contians pool of words, please find the snapshot below.

Word file snap shot

I want to map words from word file to sentence file if any word match with the sentence, I want the result in form of sentence and matched word

for example: Sentence Match Words Linux and open stack is great Linux Open stack

Please find my code below, when I am trying to extract the result in to csv, its showing error.

import pandas as pd
import csv

sentence_xlsx = pd.ExcelFile('C:\Python\Seema\Sentence.xlsx')
sentence_all = sentence_xlsx.parse('Sheet1')
#print(sentence_all)
word_xlsx = pd.ExcelFile('C:\Python\Seema\Word.xlsx')
word_all = word_xlsx.parse('Sheet1')            


for sentence in sentence_all['Article']:
    sentences = sentence.lower()

    for word in sentences.split():
        if word in ('linux','openstack'):
            result = word,sentence

results = open('C:\Python\Seema\result.csv', 'wb')
writer = csv.writer(results, dialect='excel')
writer.writerows(result)
results.close()

Traceback (most recent call last):
  File "Word_Finder2.py", line 25, in <module>
    results = open('C:\Python\Seema\result.csv', 'wb')
IOError: [Errno 22] invalid mode ('wb') or filename: 'C:\\Python\\Seema\result.c
sv'

Solution

  • The '\result.csv' part of your path has its '\r' being read as a carriage return character. To fix this, append a leading r to the path to make it a raw string literal (credit @georg).


    Then to use writerows, the result from all the iterations should be accumulated into a list and not just the last result.

    result = [] # create list to hold result from each iteration
    for sentence in sentence_all['Article']:
        sentences = sentence.lower()
    
        for word in sentences.split():
            if word in ('linux','openstack'):
                # append result of iteration to result
                result.append([sentence, word])
                #|<- creates list of rows suitable for 'writerows'
    
    results = open(r'C:\Python\Seema\result.csv', 'wb')
    #              ^ prevents \r... from being read as a special character
    writer = csv.writer(results, dialect='excel')
    writer.writerows(result)
    results.close()