Search code examples
pythonpandasdataframetext-filesleading-zero

Python: Keeping leading zeros when open txt with pandas


I have this txt file:

Tu 11:44:00 119 52913161 DETECTOR STATE 0001
Tu 11:44:00 119 52913161 DETECTOR STATE 1100
Tu 11:44:02 119 52913161 DETECTOR STATE 0000
Tu 11:44:02 119 52913161 DETECTOR STATE 1110
Tu 11:44:04 119 52913161 DETECTOR STATE 0000
Tu 11:44:04 119 52913161 DETECTOR STATE 0011

Which I open in Python (Jupyter Notebook) using this code:

import pandas as pd
data= pd.read_csv('EXPORT20171205114501_1.txt', sep=" ", header=None)
data.columns = ["day", "time", "street", "sensor", "type", "state", "bits"]

I get this output:

    day time    street  sensor  type    state   bits
0   Tu  11:44:00    119 52913161    DETECTOR    STATE      1
1   Tu  11:44:00    119 52913161    DETECTOR    STATE   1100
2   Tu  11:44:02    119 52913161    DETECTOR    STATE      0
3   Tu  11:44:02    119 52913161    DETECTOR    STATE   1110
4   Tu  11:44:04    119 52913161    DETECTOR    STATE      0
5   Tu  11:44:04    119 52913161    DETECTOR    STATE     11

How can keep the leading zeros?

I've tried the following options:

 data = pd.read_csv('EXPORT20171205114501_1.txt', sep=" ", header=None, dtype={'column': object}) 


 data = pd.read_csv('EXPORT20171205114501_1.txt', sep=" ", header=None).astype(str)


 data = pd.read_csv('EXPORT20171205114501_1.txt', sep=" ", header=None, converters={'ColName': str})

None of the above options work. I've updated pandas to the latest version. Any idea?


Solution

  • Use parameter names for columns names by list and then is possible define column for parse to string by dtype parameter with dictionary:

    names = ["day", "time", "street", "sensor", "type", "state", "bits"]
    df = pd.read_csv('EXPORT20171205114501_1.txt', sep="\s+", names=names, dtype={'bits':str})
    print (df)
      day      time  street    sensor      type  state  bits
    0  Tu  11:44:00     119  52913161  DETECTOR  STATE  0001
    1  Tu  11:44:00     119  52913161  DETECTOR  STATE  1100
    2  Tu  11:44:02     119  52913161  DETECTOR  STATE  0000
    3  Tu  11:44:02     119  52913161  DETECTOR  STATE  1110
    4  Tu  11:44:04     119  52913161  DETECTOR  STATE  0000
    5  Tu  11:44:04     119  52913161  DETECTOR  STATE  0011
    

    And if need all columns as strings:

    names = ["day", "time", "street", "sensor", "type", "state", "bits"]
    df = pd.read_csv('EXPORT20171205114501_1.txt', sep="\s+", names=names, dtype=str)
    print (df)
      day      time  street    sensor      type  state  bits
    0  Tu  11:44:00     119  52913161  DETECTOR  STATE  0001
    1  Tu  11:44:00     119  52913161  DETECTOR  STATE  1100
    2  Tu  11:44:02     119  52913161  DETECTOR  STATE  0000
    3  Tu  11:44:02     119  52913161  DETECTOR  STATE  1110
    4  Tu  11:44:04     119  52913161  DETECTOR  STATE  0000
    5  Tu  11:44:04     119  52913161  DETECTOR  STATE  0011