Search code examples
pythonregexregex-groupquantifiers

Regex Python / group quantifiers


I want to match a list of variables which look like directories, e.g.:

Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123
Same/Same2/Battery/Name=SomeString
Same/Same2/Home/Land/Some/More/Stuff=0.34

The length of the "subdirectories" is variable having an upper bound (above it's 9). I want to group every subdirectory except the 1st one which I named "Same" above.

The best I could come up with is:

^(?:([^/]+)/){4,8}([^/]+)=(.*)

It already looks for 4-8 subdirectories but only groups the last one. Why's that? Is there a better solution using group quantifiers?

Edit: Solved. Will use split() instead.


Solution

  • import re
    
    regx = re.compile('(?:(?<=\A)|(?<=/)).+?(?=/|\Z)')
    
    
    for ss in ('Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123',
               'Same/Same2/Battery/Name=SomeString',
               'Same/Same2/Home/Land/Some/More/Stuff=0.34'):
    
        print ss
        print regx.findall(ss)
        print
    

    Edit 1

    Now you have given more info on what you want to obtain ( _"Same/Same2/Battery/Name=SomeString becoming SAME2_BATTERY_NAME=SomeString"_ ) better solutions can be proposed: either with a regex or with split() , + replace()

    import re
    from os import sep
    
    sep2 = r'\\' if sep=='\\' else '/'
    
    pat = '^(?:.+?%s)(.+$)' % sep2
    print 'pat==%s\n' % pat
    
    ragx = re.compile(pat)
    
    for ss in ('Same\Same2\Foot\Ankle\Joint\Actuator\Sensor\Temperature\Value=4.123',
               'Same\Same2\Battery\Name=SomeString',
               'Same\Same2\Home\Land\Some\More\Stuff=0.34'):
    
        print ss
        print ragx.match(ss).group(1).replace(sep,'_')
        print ss.split(sep,1)[1].replace(sep,'_')
        print
    

    result

    pat==^(?:.+?\\)(.+$)
    
    Same\Same2\Foot\Ankle\Joint\Actuator\Sensor\Temperature\Value=4.123
    Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
    Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
    
    Same\Same2\Battery\Name=SomeString
    Same2_Battery_Name=SomeString
    Same2_Battery_Name=SomeString
    
    Same\Same2\Home\Land\Some\More\Stuff=0.34
    Same2_Home_Land_Some_More_Stuff=0.34
    Same2_Home_Land_Some_More_Stuff=0.34
    

    Edit 2

    Re-reading your comment, I realized that I didn't take in account that you want to upper the part of the strings that lies before the '=' sign but not after it.

    Hence, this new code that exposes 3 methods that answer this requirement. You will choose which one you prefer:

    import re
    
    from os import sep
    sep2 = r'\\' if sep=='\\' else '/'
    
    
    
    pot = '^(?:.+?%s)(.+?)=([^=]*$)' % sep2
    print 'pot==%s\n' % pot
    rogx = re.compile(pot)
    
    pet = '^(?:.+?%s)(.+?(?==[^=]*$))' % sep2
    print 'pet==%s\n' % pet
    regx = re.compile(pet)
    
    
    for ss in ('Same\Same2\Foot\Ankle\Joint\Sensor\Value=4.123',
               'Same\Same2\Battery\Name=SomeString',
               'Same\Same2\Ocean\Atlantic\North=',
               'Same\Same2\Maths\Addition\\2+2=4\Simple=ohoh'):
        print ss + '\n' + len(ss)*'-'
    
        print 'rogx groups  '.rjust(32),rogx.match(ss).groups()
    
        a,b = ss.split(sep,1)[1].rsplit('=',1)
        print 'split split  '.rjust(32),(a,b)
        print 'split split join upper replace   %s=%s' % (a.replace(sep,'_').upper(),b)
    
        print 'regx split group  '.rjust(32),regx.match(ss.split(sep,1)[1]).group()
        print 'regx split sub  '.rjust(32),\
              regx.sub(lambda x: x.group(1).replace(sep,'_').upper(), ss)
        print
    

    result, on a Windows platform

    pot==^(?:.+?\\)(.+?)=([^=]*$)
    
    pet==^(?:.+?\\)(.+?(?==[^=]*$))
    
    Same\Same2\Foot\Ankle\Joint\Sensor\Value=4.123
    ----------------------------------------------
                       rogx groups   ('Same2\\Foot\\Ankle\\Joint\\Sensor\\Value', '4.123')
                       split split   ('Same2\\Foot\\Ankle\\Joint\\Sensor\\Value', '4.123')
    split split join upper replace   SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
                  regx split group   Same2\Foot\Ankle\Joint\Sensor\Value
                    regx split sub   SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
    
    Same\Same2\Battery\Name=SomeString
    ----------------------------------
                       rogx groups   ('Same2\\Battery\\Name', 'SomeString')
                       split split   ('Same2\\Battery\\Name', 'SomeString')
    split split join upper replace   SAME2_BATTERY_NAME=SomeString
                  regx split group   Same2\Battery\Name
                    regx split sub   SAME2_BATTERY_NAME=SomeString
    
    Same\Same2\Ocean\Atlantic\North=
    --------------------------------
                       rogx groups   ('Same2\\Ocean\\Atlantic\\North', '')
                       split split   ('Same2\\Ocean\\Atlantic\\North', '')
    split split join upper replace   SAME2_OCEAN_ATLANTIC_NORTH=
                  regx split group   Same2\Ocean\Atlantic\North
                    regx split sub   SAME2_OCEAN_ATLANTIC_NORTH=
    
    Same\Same2\Maths\Addition\2+2=4\Simple=ohoh
    -------------------------------------------
                       rogx groups   ('Same2\\Maths\\Addition\\2+2=4\\Simple', 'ohoh')
                       split split   ('Same2\\Maths\\Addition\\2+2=4\\Simple', 'ohoh')
    split split join upper replace   SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh
                  regx split group   Same2\Maths\Addition\2+2=4\Simple
                    regx split sub   SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh