Search code examples
pythonpandastextimportcolumnheader

How to skip text being used as column heading using python


I am importing a web log text file in Python using Pandas. Python is reading the headers however has used the text "Fields:" as a header and has then added another column of blanks (NaN's) at the end. How can I stop this text being used as a column heading?

here is my code:

arr = pd.read_table("path", skiprows=3, delim_whitespace=True,      na_values=True)

Here is the start of the file:

Software: Microsoft Internet Information Services 7.5

Version: 1.0

Date: 2014-08-01 00:00:25

Fields: date time

2014-08-01 00:00:25...

Result is that 'Fields' is being used as a column heading and a column full of NaN values is being created for column 'time'.


Solution

  • You can do it calling read_table twice.

    # reads the forth line into 1x1 df being a string, 
    # then splits it and skips the first field:
    col_names = pd.read_table('path', skiprows=3, nrows=1, header=None).iloc[0,0].split()[1:]
    # reads the actual data:
    df = pd.read_table('path', sep=' ', skiprows=4, names=col_names)
    

    If you already know the names of the columns (eg. date and time) then it's even simpler:

    df = pd.read_table('path', sep=' ', skiprows=4, names = ['date', 'time'])