Search code examples
pythondirectorynltktokenize

How do I make to read from folder and save in another folder in Python


This code works but I have to call all the files one by one, I need to call only the folder where the files are and to save the results in another folder. I am not figuring out :( Can anybody help me, I'm new in Python. Thank you I appreciate :)

import re
import string
import sys

frequency = {}
sys.stdin = open('C:/Users/Desktop/app/data/sources/books/test.txt', 'r')
sys.stdout =open('C:/Users/Desktop/app/data/fre/news/test.txt', 'w')

text_string = sys.stdin.read()

match_pattern = re.findall(r'([-][\w]+)', text_string)

for word in match_pattern:
    count = frequency.get(word,0)
    frequency[word] = count + 1

frequency_list = frequency.keys()

for word in frequency_list:
    print (word, frequency[word])

Solution

  • Maybe something like this?

    
    import glob
    import os
    
    books = glob.glob("C:/Users/Desktop/app/data/sources/books/*.txt")
    
    # now you have a list of all .txt files in that directory.
    
    def writer(text_string, output_file):
    
        """A function to write out items from an input text string"""
    
        frequency = {}
    
        match_pattern = re.findall(r'([-][\w]+)', text_string)
    
        for word in match_pattern:
            count = frequency.get(word,0)
            frequency[word] = count + 1
    
        frequency_list = frequency.keys()
    
        for word in frequency_list:
            print(word, frequency[word], file=open(output_file, "a"))
    
    # now you have a function that essentially does the procedure you already know works
    
    for book in books:
    
        book_name = os.path.split(book)[-1]  # get <filename>.txt from the path
    
        # context manager will close the stream when you're done
        with open(book, "r") as file:  
            
            text_string = file.read()
    
            output_file = "C:/Users/Desktop/app/data/fre/news/" + book_name
    
            writer(text_string, output_file)
    
    

    This code will iterate through the .txt files in the directory you were reading from.

    I encapsulated your working code in a function (somewhat reformatted for clarity, you can specify where to print to directly from the print function), so as you iterate through the files you can read them in and drop them through the working code.