Search code examples
python-3.xstringsortingduplicatesdelimited

Most Pythonic way to eliminate duplicate entries in a delimited string (not a list) and returning the sorted result


I have a need to do some processing on many thousands of strings (each string being an element in a list, imported from records in a SQL table).

Each string comprises a number of phrases delimited by a consistent delimiter. I need to 1) eliminate duplicate phrases in the string; 2) sort the remaining phrases and return the deduplicated, sorted phrases as a delimited string.

This is what I've conjured:

def dedupe_and_sort(list_element, delimiter):

    list_element = delimiter.join(set(list_element.split(f'{delimiter}')))
    return( delimiter.join(sorted(list_element.split(f'{delimiter}'))) )

string_input = 'e\\\\a\\\\c\\\\b\\\\a\\\\b\\\\c\\\\a\\\\b\\\\d'
string_delimiter = "\\\\"

output = dedupe_and_sort(string_input, string_delimiter)

print(f"Input: {string_input}")
print(f"Output: {output}")

Output is as follows:

Input: e\\a\\c\\b\\a\\b\\c\\a\\b\\d
Output: a\\b\\c\\d\\e

Is this the most efficient approach or is there an alternative, more efficient method?


Solution

  • You can avoid splitting two times (just don't join in the first step), and there is no need to use an f-string when passing delimiter to split().

    def dedupe_and_sort(list_element, delimiter):
    
        distinct_elements = set(list_element.split(delimiter))
        return delimiter.join(sorted(distinct_elements))