Search code examples
python-3.xdataframeglob

How to prevent first line from becoming a header when import multiple files txt files in Python and assigning each to a separate data frames?


I have 4 txt files that I have been able to import, make into data frame, and store in a list. The files do not have headings in them which I cannot add in the files themselves. When I run the code, it turns the first line into headings. How do I modify this code so the first line does not become headings? all_dfs is a list containing file names.

filenames = glob.glob("U32_*.txt")
all_dfs = [pd.read_csv(filename) for filename in filenames]
for dataframe, filename in zip(all_dfs, filenames):
    dataframe['filename'] = filename

Solution

  • From the pandas.read_csv documentation:

    header : int, list of int, default ‘infer’

    Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

    names : array-like, optional

    List of column names to use. If file contains no header row, then you should explicitly pass header=None. Duplicates in this list are not allowed.

    Since your files have no headers, the default behavior of inferring the column names won't work for you, and you'll need to specify header=None in your call to read_csv to override that default behavior. You'll likely also want to supply an array of names to provide the column names.

    So it would look something like:

    pd.read_csv(filename, names=['firstColName', 'secondColName'], header=None)