Search code examples
pythonpandasdataframetextglob

Is there a way to read multiple plain text files into a dataframe?


I have multiple plain text files that need to be saved in each row in a data frame. I want to make the data frame consist of two columns: the filenames and texts. The code below does not spit error message, but it creates a data frame that takes the file contents as column names, all put in the first row.

working code (revised following the suggestions @ Code different :

 from pathlib import Path

df = []
for file in Path("/content/").glob("*.txt"):
    df.append(
        # Read each file into a new data frame
        pd.read_table(file)
        # Add a new column to store the file's name
        .assign(FileName=file.name)
    )

# Combine content from all files
df = pd.concat(df, ignore_index=True)
df
print(df)
  

the output:

Empty DataFrame
Columns: [                The Forgotten Tropical Ecosystem 
Index: []

[0 rows x 9712 columns]

How could the code be improved so that the texts are put in each row under the column heading 'text'?


Solution

  • Here is one possible answer to my question, which uses the dictionary function. My friend helped me with this and it works. Not really sure why the suggested answer would not work in my environment. But thanks anyway!

    Code:

    import os
    
    # table format [file_name: text]
    dictionary = {}
    file_names = []
    file_texts = []
    for file_name in os.listdir('.'):
      if '.txt' in file_name:
        # Load the text file
        f = open(file_name, "r")
        # Read the text in the file
        text = f.read()
    
        file_names.append(file_name)
        file_texts.append(text)
    
    dictionary["file_names"] = file_names
    dictionary["file_texts"] = file_texts
    
    import pandas as pd
    pandas_dataframe = pd.DataFrame.from_dict(dictionary)
    
    print(pandas_dataframe)