I'm making a script in Python for searching for the selected term (word/couple words, sentence) in a bunch of .txt files in a selected folder with printing out the names of the .txt files which contain the selected term. Currently is working pretty fine using os module:
import os
dirname = '/Users/User/Documents/test/reports'
search_terms = ['Pressure']
search_terms = [x.lower() for x in search_terms]
for f in os.listdir(dirname):
with open(os.path.join(dirname,f), "r", encoding="latin-1") as infile:
text = infile.read()
if all(term in text for term in search_terms):
print (f)
The output will be something like this:
3003.txt
3002.txt
3006.txt
3008.txt
I would like to append these results as a string column in Pandas Dataframe but when I 'm trying to do so I'm receiving the error message:
lst = []
if all(term in text for term in search_terms):
lst.append(f)
df = pd.DataFrame(lst)
print (f)
How can this be done?
In the code
below the new lines are indicated by '*
'.
Code from question
import os
import pandas as pd # new line * * *
import numpy as np # new line * * *
dirname = '/Users/User/Documents/test/reports'
search_terms = ['Pressure']
search_terms = [x.lower() for x in search_terms]
# Create empty dataframe to store file names # new line * * *
df = pd.DataFrame() # new line * * *
for f in os.listdir(dirname):
with open(os.path.join(dirname,f), "r", encoding="latin-1") as infile:
text = infile.read()
if all(term in text for term in search_terms):
print (f)
# Store value 'f' inside a dataframe column
df = df.append(pd.DataFrame({'file_names': ['new_file.txt']}), ignore_index=True)
Sample code
f = ['3003.txt', '3002.txt', '3006.txt', '3008.txt']
df = pd.DataFrame({'file_names': f})
df = df.append(pd.DataFrame({'file_names': ['new_file.txt']}), ignore_index=True)
df