I'd like to clean the trails of '--' in between column headers and the column data. so that i need to create a dataframe table out of it.
Please find the data below -
IP TRACER ID ID cId No Loop Element Name Freq STATUS Severity Error Message Source
-------------------- -------------------- ------------- ---- ---- ------------------------------ ---- ------------- -------------- --------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ -------------
2323Z-IH0SLX 20212800032 1 Denied Error IEHP_DOSOlderTh Date is older than 12-months
2325611-2SU 202210201377 0 837/002A1/2300/HI/01/02 1 R valid 0x08C8F Value of element is incorrect.
-------------------- ---------------- ---- -------------- --------------------------------------- --------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
232561-EZBCD 2022112800195 0 837/00522A1/2300/HI/01/02 1 R valid 0xC8F Value of element is incorrect.
ive tried to use the command - data.replace("--", "")
, but since there are '-' in the column data too , its been removed too!
I want the output to be like below.
IP TRACER ID ID cId No Loop Element Name Freq STATUS Severity Error Message Source
2323Z-IH0SLX 20212800032 1 Denied Error IEHP_DOSOlderTh Date is older than 12-months
IEHP
2325611-2SU 202210201377 0 837/002A1/2300/HI/01/02 1 R valid 0x08C8F Value of element is incorrect.
232561-EZBCD 2022112800195 0 837/00522A1/2300/HI/01/02 1 R valid 0xC8F Value of element is incorrect.
Note - Removing the second row and last but one row might not be the solution, because data might not be consistent. Even if second row could be, but, last but one couldnt by using row number.
Please guide me!
Using regex to check if the line only contains
and -
seems to work.
We can use .join
and .split
to replace lines that march this criteria with empty lines.
data = r'''IP TRACER ID ID cId No Loop Element Name Freq STATUS Severity Error Message Source
-------------------- -------------------- ------------- ---- ---- ------------------------------ ---- ------------- -------------- --------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ -------------
2323Z-IH0SLX 20212800032 1 Denied Error IEHP_DOSOlderTh Date is older than 12-months
2325611-2SU 202210201377 0 837/002A1/2300/HI/01/02 1 R valid 0x08C8F Value of element is incorrect.
-------------------- ---------------- ---- -------------- --------------------------------------- --------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
232561-EZBCD 2022112800195 0 837/00522A1/2300/HI/01/02 1 R valid 0xC8F Value of element is incorrect. '''
import re
processed_data = "\n".join("" if re.fullmatch("[ -]+", line) else line for line in data.split("\n"))
print(processed_data)
results in
IP TRACER ID ID cId No Loop Element Name Freq STATUS Severity Error Message Source
2323Z-IH0SLX 20212800032 1 Denied Error IEHP_DOSOlderTh Date is older than 12-months
2325611-2SU 202210201377 0 837/002A1/2300/HI/01/02 1 R valid 0x08C8F Value of element is incorrect.
232561-EZBCD 2022112800195 0 837/00522A1/2300/HI/01/02 1 R valid 0xC8F Value of element is incorrect.