Search code examples
pythonsplitdelimiter

How to split a filename based on the latest occurrence of repeated delimiters? (Python)


How can I split a filename based on the latest occurrence of a repeated delimiter? Such that:

Example File List:

abc_123
abc_123_d4
abc__123  (2 underscores)
abc_123__d4  (2 underscores)
abc____123  (4 underscores)

Expected Outcome:

abc, 123
abc, 123, d4
abc_, 123 (1 underscore)
abc, 123_, d4 (1 underscore)
abc___, 123 (3 underscores)

Using:

filename.split("_")

would output:

abc, 123
abc, 123, d4
abc, 123
abc, 123, d4
abc, 123

Solution

  • Using re.split

    import re
    
    pattern = re.compile(r'_(?!_)')
    
    pattern.split('abc_123')  # ['abc', '123']
    pattern.split('abc_123_d4')  # ['abc', '123', 'd4']
    pattern.split('abc__123')  # ['abc_', '123']
    pattern.split('abc_123__d4')  # ['abc', '123_', 'd4']
    pattern.split('abc____123')  # ['abc___', '123']
    

    The regex _(?!_) matches an underscore that is not followed by another underscore