Search code examples
pythonpython-3.xstringnlpexport-to-csv

Writing out a list of phrases to a csv file


Following on from an earlier post, I have written some Python code to calculate the frequency of occurrences of certain phrases (contained in the "word_list" variable with three examples listed but will have many more) in a large number of text files. The code I've written below requires me to take each element of the list and insert it into a string for comparison to each text file. However the current code is only writing the frequencies for the last phrase in the list rather than all of them to the relevant columns in a spreadsheet. Is this just an indent issue, not placing the writerow in the correct position or is there a logic flaw in my code. Also is there any way to avoid using a list to string assignment in order to compare the phrases to those in the text files?

word_list = ['in the event of', 'frankly speaking', 'on the other hand']
S = {}
p = 0
k = 0

with open(file_path, 'w+', newline='') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["Fohone-K"] + word_list)

    for filename in glob.glob(os.path.join(path, '*.txt')):
     if filename.endswith('.txt'):
        f = open(filename)
        Fohone-K = filename[8:]
        data = f.read()
        # new code section from scratch file
        l = len(word_list)
        for s in range(l):
         phrase = word_list[s]
         S = data.count((phrase))
         if S:
          #k = k + 1
          print("'{}' match".format(Fohone-K), S)
         else:
          print("'{} no match".format(Fohone-K))
          print("\n")

          # for m in word_list:
     if S >= 0:
      print([Fohone-K] + [S])
     writer.writerow([Fohone-K] + [S])

The output currently looks like this.

enter image description here

When it needs to look like this.

enter image description here


Solution

  • You probably were going for something like this:

    import csv, glob, os
    
    word_list = ['in the event of', 'frankly speaking', 'on the other hand']
    file_path = 'out.csv'
    path = '.'
    
    with open(file_path, 'w+', newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(["Fohone-K"] + word_list)
    
        for filename in glob.glob(os.path.join(path, '*.txt')):
            if filename.endswith('.txt'):
                with open(filename) as f:
                    postfix = filename[8:]
                    content = f.read()
                    matches = [content.count(phrase) for phrase in word_list]
                    print(f"'{filename}' {'no ' if all(n == 0 for n in matches) else ''}match")
                    writer.writerow([postfix] + matches)
    

    The key problem was you were writing S on each row, which only contained a single count. That's fixed here by writing a full set of matches.