Search code examples
pythonpython-3.xregexpython-refindall

How to show full results, rather than matched text from regex searches in python


I am creating a script that searches for a file based on a keyword, my output should be the whole observation, rather than just the matched text, but I'm finding .group doesn't work on this.

import re 
import os 
 
pers_info = pd.read_csv(r".....StateWorkforceMailingList_2-7-19a.csv",encoding='utf-8')

Pers_info['State'] = Texas, Florida etc... 

 files=os.listdir(r"....\State Files")
 
Files = list of WORKFORCE_2017_ALABAMA_FILE.xlsx,...,n

matches=re.findall(pers_info.State[4], files.replace("_", " "),re.I)
print(match) 

My intended output is WORKFORCE_2017_ALABAMA_FILE.xlsx Instead I get 'Alabama'

Should I try a boolean mask ?


Solution

  • Use

    >>> import pandas as pd
    >>> Pers_info = pd.DataFrame({'State':['Texas', 'Alabama', 'Florida']})
    >>> Files = ['WORKFORCE_2017_ALABAMA_FILE.xlsx', 'WORKFORCE_2017_FILE.xlsx']
    >>> pattern = re.compile(rf'(?<![^\W_])(?:{"|".join(Pers_info["State"].to_list())})(?![^\W_])', re.I)
    >>> list(filter(pattern.search, Files))
    ['WORKFORCE_2017_ALABAMA_FILE.xlsx']
    

    See regex proof.

    EXPLANATION

    --------------------------------------------------------------------------------
      (?<!                     look behind to see if there is not:
    --------------------------------------------------------------------------------
        [^\W_]                   any character except: non-word
                                 characters (all but a-z, A-Z, 0-9, _),
                                 '_'
    --------------------------------------------------------------------------------
      )                        end of look-behind
    --------------------------------------------------------------------------------
      (?:                      group, but do not capture:
    --------------------------------------------------------------------------------
        Texas                    'Texas'
    --------------------------------------------------------------------------------
       |                        OR
    --------------------------------------------------------------------------------
        Alabama                  'Alabama'
    --------------------------------------------------------------------------------
       |                        OR
    --------------------------------------------------------------------------------
        Florida                  'Florida'
    --------------------------------------------------------------------------------
      )                        end of grouping
    --------------------------------------------------------------------------------
      (?!                      look ahead to see if there is not:
    --------------------------------------------------------------------------------
        [^\W_]                   any character except: non-word
                                 characters (all but a-z, A-Z, 0-9, _),
                                 '_'
    --------------------------------------------------------------------------------
      )                        end of look-ahead