Search code examples
pythonregexcross-platformeol

Easiest way to count cross platform newline patterns


What's the easiest way to count the number of newlines in a string that contains newlines that conform to the cross-platform newline pattern: '\r\n?|\n'.

Say we're skipping white space, or white space plus some other characters in a buffer, but in the mean time we would like to increment the line count. I'm doing something like:

nlinePat = re.compile(r'\r\n?|\n')

wsPat = re.compile(r'[ \t\r\n]+') # skip (specific) white space chars
commaPat = re.compile(r'[ \t\r\n]*,[ \t\r\n]*') # skip comma and surrounding white space
#...

m1 = wsPat.match(buffer)
bufferPos += len(m1.group(0))

m2 = nlinePat.findall(m1.group(0))
nlineCounter += len(m2))

(For example: can the above be done using a single regex operation, I feel it's an overhead to skip newlines first then to count them)


Solution

  • What you're doing is pretty good. Another way to do it is to split the buffer on nlinePat and process each line, knowing that you can add 1 to nlineCount each time you process a line. My solution means you won't be keeping track of the number of characters (because the split may split on one or two charcters, and you don't know how many whitespace characters are stripped).

    I think you will have a hard time finding a way to do this "in python", you need to do more than one thing at a time (count newlines and count characters) so maybe you should parse it character by character yourself.

    My example:

    #!/usr/bin/env python
    
    import re
    
    buffer = '''
    \tNow is the time\t
    for all good men\r\tto come to the aid\t\r
    of their party.
    '''
    
    
    nlinePat = re.compile(r'\r\n?|\n')
    
    bufferPos = 0
    nlineCounter = 0
    
    bl = nlinePat.split (buffer)
    
    for line in bl:
        print(line.strip ())
        nlineCounter += 1
    
    print nlineCounter