Search code examples
pythonpython-3.xnltkcorpus

How to read multiple nltk corpus files and write in a single text file in python


I have written the following code:

import nltk

then

file1 = nltk.corpus.gutenberg.words('shakespeare-caesar.txt')
file2 = nltk.corpus.gutenberg.words('shakespeare-hamlet.txt')
file3 = nltk.corpus.gutenberg.words('shakespeare-macbeth.txt')

the part where I try to write the contents in a single file

filenames = [file1, file2, file3]
with open('result.txt', 'w') as outfile: #want to store the contents of 3 files in result.txt
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

for which i get the following error

TypeError                                 Traceback (most recent call last)
<ipython-input-9-917545c3c1ce> in <module>()
      2 with open('result.txt', 'w') as outfile:
      3     for fname in filenames:
----> 4         with open(fname) as infile:
      5             for line in infile:
      6                 outfile.write(line)

TypeError: invalid file: ['[', 'The', 'Tragedie', 'of', 'Julius', 'Caesar', ...]

Solution

  • As the last line in the error message shows, file1 et al. are not filenames, but lists of words. Instead of using the words function, you can just combine the files into one like this:

    filenames = [
        "shakespeare-caesar.txt",
        "shakespeare-hamlet.txt",
        "shakespeare-macbeth.txt"
    ]
    with open("result.txt", "w") as f:
        for filename in filenames:
            f.write(nltk.corpus.gutenberg.raw(filename))