Search code examples
pythonregexstringsplitattributeerror

split on multiple characters in string


I have a list of filenames that I need to sort based on a section within the string. However, it only works if I make the file extension part of my sorting dictionary. I want this to work if the file is a .jpg or a .png, so I am trying to split on both the '_' and the '.' character.

sorting = ['FRONT', 'BACK', 'LEFT', 'RIGHT', 'INGREDIENTS', 'INSTRUCTIONS', 'INFO', 'NUTRITION', 'PRODUCT']

filelist = ['3006345_2234661_ENG_PRODUCT.jpg', '3006345_2234661_ENG_FRONT.jpg', '3006345_2234661_ENG_LEFT.jpg', '3006345_2234661_ENG_RIGHT.jpg', '3006345_2234661_ENG_BACK.jpg', '3006345_2234661_ENG_INGREDIENTS.jpg', '3006345_2234661_ENG_NUTRITION.jpg', '3006345_2234661_ENG_INSTRUCTIONS.jpg', '3006345_2234661_ENG_INFO.jpg']

sort = sorted(filelist, key = lambda x : sorting.index(x.re.split('_|.')[3]))

print(sort)

This returns the error "AttributeError: 'str' object has no attribute 're'"

What do I need to do to split on both the _ and . when splitting out my strings for sorting? I only want to use the split for the sorting, not for re-forming the strings.


Solution

  • Here's the fixed code:

    sorted_output = sorted(filelist,key=lambda x: sorting.index(re.split(r'_|\.',x)[3])) 
    

    The string input to re.split() should be passed as the second argument to the function; you do not call re.split() on a string. The first argument is the regular expression itself which you had correct.

    Also: you need to escape the . with a \ because the full-stop or period is a special character in regular expressions which matches everything.

    Output:

    In [13]: sorted(filelist,key=lambda x: sorting.index(re.split(r'_|\.',x)[3]))                       
    Out[13]: 
    ['3006345_2234661_ENG_FRONT.jpg',
     '3006345_2234661_ENG_BACK.jpg',
     '3006345_2234661_ENG_LEFT.jpg',
     '3006345_2234661_ENG_RIGHT.jpg',
     '3006345_2234661_ENG_INGREDIENTS.jpg',
     '3006345_2234661_ENG_INSTRUCTIONS.jpg',
     '3006345_2234661_ENG_INFO.jpg',
     '3006345_2234661_ENG_NUTRITION.jpg',
     '3006345_2234661_ENG_PRODUCT.jpg']
    

    Edit: as @Todd mentions in the comments, if you want to additionally ensure that the strings are sorted by the numeric part after the first sort takes place then use:

    sorted(filelist,key=lambda x: [sorting.index(re.split(r'_|\.',x)[3]),x])