Search code examples
pythonregexcarriage-return

Python not finding carriage returns (^M or \r) visible to nano and vi


I am currently writing a bash script that returns the output of the functions it performs as a log file. When this is viewed in Notepad text showing the process of a function takes up the bulk of the file. Ideally I don't want all this bulk and I just want to show the last output piece of progress text showing the process has been performed successfully.

output_log_18_01_2024.log - Notepad
+---------------------------------------------------------------------------+
 Reading flash ............... 0x00000000 (0%) 
 Reading flash ............... 0x00001000 (0%) 
 Reading flash ............... 0x00002000 (0%) 
 Reading flash ............... 0x00003000 (0%) 
 Reading flash ............... 0x00004000 (0%) 
 Reading flash ............... 0x00005000 (0%) 
 Reading flash ............... 0x00006000 (0%) 
 Reading flash ............... 0x00007000 (0%) 
 Reading flash ............... 0x00008000 (0%) 
 Reading flash ............... 0x00009000 (0%) 
 Reading flash ............... 0x0000A000 (0%) 
 Reading flash ............... 0x0000B000 (0%) 
 Reading flash ............... 0x0000C000 (0%) 
 Reading flash ............... 0x0000D000 (0%) 
 Reading flash ............... 0x0000E000 (0%) 
 Reading flash ............... 0x0000F000 (0%) 
 Reading flash ............... 0x00010000 (0%) 
 Reading flash ............... 0x00011000 (0%) 
 Reading flash ............... 0x00012000 (0%) 
 Reading flash ............... 0x00013000 (0%)
 ...

When viewed in nano or vi on the command line, this process text appears as one line separated by carriage returns ^M.

GNU nano 5.6.1                           output_log_18_01_2024.log
+---------------------------------------------------------------------------+
Reading flash ............... 0x00000000 (0%) ^M Reading flash ............... 0x00001000 (0%) ^M Reading flash ............... 0x00002000 (0%) ^M Reading flash ............... 0x00003000 (0%) ^M Reading flash ............... 0x00004000 (0%) etc...

As such I am trying to use a python script to detect these carriage returns and process them accordingly. Here is a script I have tried:

import sys, re

f = open(sys.argv[1], "r")
data = f.read()

counter = 0
for char in data:
    print(char, end="")
    if char == r'\r':
        counter += 1
        
print(counter)
# always returns 0


rx = r'\r'
rr = re.search(rx, data)

print(rr)
# always returns None

f.close()

Why is python unable to see these carriage returns? Is it just interpretting them as new lines and there's not much I can do? I am open to any support in python or bash, basically anything that will let me remove the bulk of this log and retain the important text.

Edit: Removed images of errors, replaced with text copies.


Solution

  • Here's an answer that ended up working for me. There's probably a far more efficient way of doing this. but the script this is a part of is already very very slow.

    # formatter.py
    
    #!/usr/bin/python3
    
    import sys
    
    ### Split at newlines.
    testlist = repr(sys.argv[1]).split(r'\n')
    
    for i in range(0, len(testlist)):
    
        line = testlist[i]
        if line[1] == " ":
            line = line.replace(" ","",1)
        ### If a return carriage is found in line
        if (line.find(r'\r') > -1):
            old_idx = 0
            curr_idx = line.find(r'\r', old_idx)
            ### Find last occurence of \r   
            while curr_idx != -1:
                old_idx = curr_idx
                curr_idx = line.find(r'\r', (old_idx+1))
            
            ### Return line from position of penultimate carriage return to end of line.
            print(line[old_idx+2:].replace(r'\r',""))
        ### If no carriage returns are found, return script as is.
        else:
            print(line)
    
    # echoline.sh
    
    #!/bin/bash
    
    echoLine()
    {
        outfile="test.txt"
    
        # get input
        set -f # stops * being expanded
        if test ! -t 0 # pipe
        then
            indata=$(</dev/stdin)
        elif test -n "$1"
        then
            indata="$@"
        fi
    
        indata="${indata//^M/'\r'}"
        IFS='$'; arrIN=($indata); unset IFS;
    
        for (( x=0; x<${#arrIN[@]}; x++))
        do
            outdata=$(python3 formatter.py "${arrIN[x]}" 2>/dev/null || python3 formatter.py "${arrIN[x]: -300}")
            outdata="${outdata//"'"/""} \n"
            echo -ne "$outdata" >> $outfile
            echo -ne "$outdata"
        done
    }
    
    echoLine -s $(cat -e output_log_18_01_2024_old.log)