Search code examples
python-3.xstringtexttext-processing

Python pattern auto matching within the list


I am trying to write a small script to group strings with similar patterns together. The following is my program snippet, which is working fine, but a little inaccurate.

lst = ["report-2020.10.13", "report-2020.12.12", "analytics-2020.12.14", "sales-cda87", "analytics-2020.11.21", "sales-vu7sa"]

final = []
for pat in lst:
    pat = pat[:len(pat) // 2]
    ils = []
    for pat2 in lst:
        if pat2.startswith(pat):
            ils.append(pat2)
    final.append(tuple(ils))

finalls = list(set(final))
for f in finalls:
    print(f)

Also, I want the exact string pattern that groups the string. For example, from string list ["rep-10-01", "rep-10-02", "rep-11-06"] I want "rep-" as a pattern. Are there any improvements required? Or any libraries/modules that can help me out in first as well as second problem? Thanks in advance.


Solution

  • Does this work as you expected:

    
        from collections import defaultdict
        
        res = defaultdict(str)
        
        lst = ["report-2020.10.13", "report-2020.12.12", "analytics-2020.12.14",
               "sales-cda87", "analytics-2020.11.21", "sales-vu7sa"]
        
        #ll = ['rep-10-01', 'rep-10-02', 'rep-11-06']
        
        
        for pat in lst:
            pattern = pat.split('-')
            #print(pattern[0])  # real pattern - eg. report, sales, analytics
            
            res[pattern[0]] += pat+ ', '
            
        print(res)
    
          Output: 
        defaultdict(<class 'str'>, {'report': 'report-2020.10.13, report-2020.12.12, ', 'analytics': 'analytics-2020.12.14, analytics-2020.11.21, ', 'sales': 'sales-cda87, sales-vu7sa, '})