I have a need to do some processing on many thousands of strings (each string being an element in a list, imported from records in a SQL table).
Each string comprises a number of phrases delimited by a consistent delimiter. I need to 1) eliminate duplicate phrases in the string; 2) sort the remaining phrases and return the deduplicated, sorted phrases as a delimited string.
This is what I've conjured:
def dedupe_and_sort(list_element, delimiter):
list_element = delimiter.join(set(list_element.split(f'{delimiter}')))
return( delimiter.join(sorted(list_element.split(f'{delimiter}'))) )
string_input = 'e\\\\a\\\\c\\\\b\\\\a\\\\b\\\\c\\\\a\\\\b\\\\d'
string_delimiter = "\\\\"
output = dedupe_and_sort(string_input, string_delimiter)
print(f"Input: {string_input}")
print(f"Output: {output}")
Output is as follows:
Input: e\\a\\c\\b\\a\\b\\c\\a\\b\\d
Output: a\\b\\c\\d\\e
Is this the most efficient approach or is there an alternative, more efficient method?
You can avoid splitting two times (just don't join in the first step), and there is no need to use an f-string when passing delimiter
to split()
.
def dedupe_and_sort(list_element, delimiter):
distinct_elements = set(list_element.split(delimiter))
return delimiter.join(sorted(distinct_elements))