Search code examples
pythondataframereplace

python code to remove special character '-' from certain strings only


I'd like to clean the trails of '--' in between column headers and the column data. so that i need to create a dataframe table out of it.

Please find the data below -

IP TRACER ID        ID              cId      No    Loop  Element Name                    Freq  STATUS         Severity  Error         Message                                                                                                                                                                                                                                                                                                       Source
--------------------  --------------------  -------------  ----  ----  ------------------------------  ----  -------------  --------------  ---------------  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  -------------
2323Z-IH0SLX       20212800032                                                             1     Denied         Error           IEHP_DOSOlderTh  Date is older than 12-months 

2325611-2SU       202210201377                 0           837/002A1/2300/HI/01/02  1     R       valid         0x08C8F       Value of element is incorrect.            
-------------------- ----------------     ---- -------------- --------------------------------------- --------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------         
232561-EZBCD       2022112800195                 0           837/00522A1/2300/HI/01/02  1     R       valid         0xC8F       Value of element is incorrect. 

ive tried to use the command - data.replace("--", ""), but since there are '-' in the column data too , its been removed too!

I want the output to be like below.

IP TRACER ID        ID              cId      No    Loop  Element Name                    Freq  STATUS         Severity  Error         Message                                                                                                                                                                                                                                                                                                       Source

2323Z-IH0SLX       20212800032                                                             1     Denied         Error           IEHP_DOSOlderTh  Date is older than 12-months 
                                                                    IEHP           
2325611-2SU       202210201377                 0           837/002A1/2300/HI/01/02  1     R       valid         0x08C8F       Value of element is incorrect.            
        
232561-EZBCD       2022112800195                 0           837/00522A1/2300/HI/01/02  1     R       valid         0xC8F       Value of element is incorrect.

Note - Removing the second row and last but one row might not be the solution, because data might not be consistent. Even if second row could be, but, last but one couldnt by using row number.

Please guide me!


Solution

  • Using regex to check if the line only contains and - seems to work. We can use .join and .split to replace lines that march this criteria with empty lines.

    data = r'''IP TRACER ID        ID              cId      No    Loop  Element Name                    Freq  STATUS         Severity  Error         Message                                                                                                                                                                                                                                                                                                       Source
    --------------------  --------------------  -------------  ----  ----  ------------------------------  ----  -------------  --------------  ---------------  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  -------------
    2323Z-IH0SLX       20212800032                                                             1     Denied         Error           IEHP_DOSOlderTh  Date is older than 12-months 
    
    2325611-2SU       202210201377                 0           837/002A1/2300/HI/01/02  1     R       valid         0x08C8F       Value of element is incorrect.            
    -------------------- ----------------     ---- -------------- --------------------------------------- --------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------         
    232561-EZBCD       2022112800195                 0           837/00522A1/2300/HI/01/02  1     R       valid         0xC8F       Value of element is incorrect. '''
    import re
    processed_data = "\n".join("" if re.fullmatch("[ -]+", line) else line for line in data.split("\n"))
    print(processed_data)
    

    results in

    IP TRACER ID        ID              cId      No    Loop  Element Name                    Freq  STATUS         Severity  Error         Message                                                                                                                                                                                                                                                                                                       Source
    
    2323Z-IH0SLX       20212800032                                                             1     Denied         Error           IEHP_DOSOlderTh  Date is older than 12-months 
    
    2325611-2SU       202210201377                 0           837/002A1/2300/HI/01/02  1     R       valid         0x08C8F       Value of element is incorrect.            
    
    232561-EZBCD       2022112800195                 0           837/00522A1/2300/HI/01/02  1     R       valid         0xC8F       Value of element is incorrect.