My data is having a repetitive pattern:
2021-11-17 10:59:10.880
SysState: 4, Events: 161403, EMS: 4, VDB: 2, TubeState: 0x02
BDR Mode: 1, BMS Ext: 2, BMS Int: 0, BdrStat: 00
CPU(%):16, CPUmax(%):47, task idx:3, CPUmaxIRQ(%):0
SOC:9973, SOH:100, LV?:0, HV?:0
3330mV 3333mV 3332mV 3332mV 3331mV 0 0
3331mV 3324mV 3325mV 3325mV 3328mV 0 0
3325mV 3321mV 3328mV 3328mV 3327mV 0 0
3329mV 0mV 0mV 0mV 0mV 0 0
BPV:53288, PLV:53241, BPC:0, PLC:0
AMBI:421, CONN:278, FETS:282, BMSC:274, BPA1:259, BPA2:237, BPA3:255
2021-11-17 10:59:13.80
SysState: 4, Events: 161407, EMS: 4, VDB: 3, TubeState: 0x08
BDR Mode: 4, BMS Ext: 3, BMS Int: 1, BdrStat: 00
CPU(%):12, CPUmax(%):47, task idx:3, CPUmaxIRQ(%):0
SOC:9973, SOH:100, LV?:0, HV?:0
3332mV 3331mV 3332mV 3332mV 3331mV 0 0
3331mV 3324mV 3325mV 3326mV 3328mV 0 0
3324mV 3321mV 3328mV 3328mV 3327mV 0 0
3329mV 0mV 0mV 0mV 0mV 0 0
BPV:53288, PLV:53277, BPC:23, PLC:0
AMBI:421, CONN:278, FETS:282, BMSC:276, BPA1:259, BPA2:237, BPA3:255
2021-11-17 10:59:15.280
SysState: 4, Events: 161407, EMS: 4, VDB: 3, TubeState: 0x08
BDR Mode: 4, BMS Ext: 3, BMS Int: 1, BdrStat: 00
CPU(%):11, CPUmax(%):47, task idx:3, CPUmaxIRQ(%):0
SOC:9973, SOH:100, LV?:0, HV?:0
3331mV 3332mV 3331mV 3332mV 3331mV 0 0
3331mV 3324mV 3325mV 3325mV 3328mV 0 0
3324mV 3322mV 3328mV 3328mV 3327mV 0 0
3331mV 0mV 0mV 0mV 0mV 0 0
BPV:53288, PLV:53259, BPC:47, PLC:47
AMBI:421, CONN:278, FETS:282, BMSC:276, BPA1:259, BPA2:237, BPA3:255
What I want to do is separate every value and make it a column starting from '2021-11-17 10:59:10.880' to 'BPA3:255'
Index | Another header | Another header | Another header |
---|---|---|---|
0 | 2021-11-17 10:59:10.880 | SysState: 4 | Events: 161403 |
1 | 2021-11-17 10:59:13.80 | SysState: 4 | Events: 1161407 |
so on and so forth..
what is have done so far:
The file was a .txt file and I converted it into csv first and then:
df = pd.read_csv('data.csv', sep=',' )
but it gives me ParserError: Error tokenizing data. Anybody knows how to solve this problem? with the sep= ';' or changing the text file to csv gives me the following output:
Is there a way to resolve this while parsing text file and not converting it into csv?
Assuming you want only the date, the SysState and Events, an easy way is to extract the info using a regex.
I also assumed the file is not huge so I am loading all in memory, if this is not the case then you'll have to parse line by line.
with open('filename.csv') as f:
lines = f.read()
import re
regex = re.compile('(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\.\d+)\nSysState: (\d+),\s+Events: (\d+).+?')
df = pd.DataFrame(regex.findall(lines), columns=['datetime', 'SysState', 'Events'])
NB. I extracted only the numbers from the fields, but if you really want to have SysState: 4
, etc. it is easy to add it in the capturing group
output:
datetime SysState Events
0 2021-11-17 10:59:10.880 4 161403
1 2021-11-17 10:59:13.80 4 161407
2 2021-11-17 10:59:15.280 4 161407