I am importing a web log text file in Python using Pandas. Python is reading the headers however has used the text "Fields:" as a header and has then added another column of blanks (NaN's) at the end. How can I stop this text being used as a column heading?
here is my code:
arr = pd.read_table("path", skiprows=3, delim_whitespace=True, na_values=True)
Here is the start of the file:
Software: Microsoft Internet Information Services 7.5
Version: 1.0
Date: 2014-08-01 00:00:25
Fields: date time
2014-08-01 00:00:25...
Result is that 'Fields' is being used as a column heading and a column full of NaN values is being created for column 'time'.
You can do it calling read_table
twice.
# reads the forth line into 1x1 df being a string,
# then splits it and skips the first field:
col_names = pd.read_table('path', skiprows=3, nrows=1, header=None).iloc[0,0].split()[1:]
# reads the actual data:
df = pd.read_table('path', sep=' ', skiprows=4, names=col_names)
If you already know the names of the columns (eg. date
and time
) then it's even simpler:
df = pd.read_table('path', sep=' ', skiprows=4, names = ['date', 'time'])