I am trying to write a small script to group strings with similar patterns together. The following is my program snippet, which is working fine, but a little inaccurate.
lst = ["report-2020.10.13", "report-2020.12.12", "analytics-2020.12.14", "sales-cda87", "analytics-2020.11.21", "sales-vu7sa"]
final = []
for pat in lst:
pat = pat[:len(pat) // 2]
ils = []
for pat2 in lst:
if pat2.startswith(pat):
ils.append(pat2)
final.append(tuple(ils))
finalls = list(set(final))
for f in finalls:
print(f)
Also, I want the exact string pattern that groups the string. For example, from string list ["rep-10-01", "rep-10-02", "rep-11-06"]
I want "rep-"
as a pattern.
Are there any improvements required? Or any libraries/modules that can help me out in first as well as second problem?
Thanks in advance.
Does this work as you expected:
from collections import defaultdict
res = defaultdict(str)
lst = ["report-2020.10.13", "report-2020.12.12", "analytics-2020.12.14",
"sales-cda87", "analytics-2020.11.21", "sales-vu7sa"]
#ll = ['rep-10-01', 'rep-10-02', 'rep-11-06']
for pat in lst:
pattern = pat.split('-')
#print(pattern[0]) # real pattern - eg. report, sales, analytics
res[pattern[0]] += pat+ ', '
print(res)
Output:
defaultdict(<class 'str'>, {'report': 'report-2020.10.13, report-2020.12.12, ', 'analytics': 'analytics-2020.12.14, analytics-2020.11.21, ', 'sales': 'sales-cda87, sales-vu7sa, '})