Search code examples
python-2.7csvtextpandasdataframe

Read multiple *.txt files into Pandas Dataframe with filename as column header


I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.

Requirements and Background information:

  1. Each file has one column of numbers
  2. No headers are present in the files
  3. Positive and negative integers are possible
  4. The size of all the *.txt files is the same
  5. The columns of the DataFrame must have the name of file (without extension) as the header
  6. The number of files is not known ahead of time

Here is one sample *.txt file. All the others have the same format.

16
54
-314
1
15
4
153
86
4
64
373
3
434
31
93
53
873
43
11
533
46

Here is my attempt:

import pandas as pd
import os
import glob

# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir( my_dir )

# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
    fileName, fileExtension = os.path.splitext(files)
    filelist.append(fileName) #filename without extension
    filesList.append(files) #filename with extension

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(filesList[ijk])
    df = df.append(frame)
print df

Steps 1 and 2 work. I am having problems with step 3. I get the following error message:

Traceback (most recent call last):
  File "C:\Python27\TextFile.py", line 26, in <module>
    frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str

Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?


Solution

  • You can read them into multiple dataframes and concat them together afterwards. Suppose you have two of those files, containing the data shown.

    In [6]:
    filelist = ['val1.txt', 'val2.txt']
    print pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist], axis=1)
        val1  val2
    0     16    16
    1     54    54
    2   -314  -314
    3      1     1
    4     15    15
    5      4     4
    6    153   153
    7     86    86
    8      4     4
    9     64    64
    10   373   373
    11     3     3
    12   434   434
    13    31    31
    14    93    93
    15    53    53
    16   873   873
    17    43    43
    18    11    11
    19   533   533
    20    46    46