Search code examples
pythonpandasfiledirectorypython-os

Appending the results of for loop with if statement to Pandas Dataframe in Python


I'm making a script in Python for searching for the selected term (word/couple words, sentence) in a bunch of .txt files in a selected folder with printing out the names of the .txt files which contain the selected term. Currently is working pretty fine using os module:

import os

dirname = '/Users/User/Documents/test/reports'

search_terms = ['Pressure']
search_terms = [x.lower() for x in search_terms]

for f in os.listdir(dirname):
    with open(os.path.join(dirname,f), "r", encoding="latin-1") as infile:
        text =  infile.read()

    if all(term in text for term in search_terms):
        print (f)

The output will be something like this:

3003.txt
3002.txt
3006.txt
3008.txt

I would like to append these results as a string column in Pandas Dataframe but when I 'm trying to do so I'm receiving the error message:

lst = []

    if all(term in text for term in search_terms):
        lst.append(f)
        df = pd.DataFrame(lst)
        print (f)

How can this be done?


Solution

  • In the code below the new lines are indicated by '*'.

    Code from question

    import os
    import pandas as pd # new line * * *
    import numpy as np # new line * * *
    
    dirname = '/Users/User/Documents/test/reports'
    
    search_terms = ['Pressure']
    search_terms = [x.lower() for x in search_terms]
    
    # Create empty dataframe to store file names # new line * * *
    df = pd.DataFrame()  # new line * * *
    
    for f in os.listdir(dirname):
        with open(os.path.join(dirname,f), "r", encoding="latin-1") as infile:
            text =  infile.read()
    
        if all(term in text for term in search_terms):
            print (f)
            # Store value 'f' inside a dataframe column
            df = df.append(pd.DataFrame({'file_names': ['new_file.txt']}), ignore_index=True)
    

    Sample code

    f = ['3003.txt', '3002.txt', '3006.txt', '3008.txt']
    df = pd.DataFrame({'file_names': f})
    df = df.append(pd.DataFrame({'file_names': ['new_file.txt']}), ignore_index=True)
    df
    

    enter image description here