Search code examples
pythonregexlookbehind

Python : regex lookbehind get word after single or double quotes


I have file with contents like below. I am trying to extract the word next to "-x" in the file and finally need to get only uniq results. As a part of that i tried the below regex but got only single and double quotes in the output. When i use regex only for double quotes, I got the result.

File Content

00 04 * * 2-6   testuser   /get_results.sh -q -x 'igp_srm_m' -s 'yesterday' -e 'yesterday' -m '2048' -b >>'/var/log/process/srm-console.log' 2>&1
00 10 * * 2-6   testuser   /get_results.sh -q -x 'igp_srm_m' -s 'yesterday' -e 'yesterday' -m '2048' -w '720' >>'/var/log/process/srm-console.log' 2>&1

00 08 * * 1-5   testuser   /get_results.sh -q -x "igp_france" -s "today" -e "today" -m "90000" -b -z partA >>"/var/log/process/france-partA-console.log" 2>&1
00 12 * * 2-6   testuser   /get_results.sh -q -x "igp_france" -s "yesterday" -e "yesterday" -m "90000" -w "900" -z partA >>"/var/log/process/france-partA-console.log" 2>&1

00 08 * * 1-5   testuser   /get_results.sh -q -x "igp_france" -s "today" -e "today" -m "90000" -b -z partB >>"/var/log/process/france-partB-console.log" 2>&1
00 12 * * 2-6   testuser   /get_results.sh -q -x "igp_france" -s "yesterday" -e "yesterday" -m "90000" -w "900" -z partB >>"/var/log/process/france-partB-console.log" 2>&1

00 12 * * 2-6   testuser   JAVA_OPTS='-server -Xmx512m' /merge.sh "yesterday" "igp_france" "partA,partB" >>"/var/log/process/france-console.log" 2>&1
00 08 * * 1-5   testuser   /get_results.sh -q -x "igpswitz_france" -s "today" -e "today" -m "15000" -b >>'/var/log/process/igpswitz_france-console.log' 2>&1
00 12 * * 2-6   testuser   /get_results.sh -q -x "igpswitz_france" -s "yesterday" -e "yesterday" -m "15000" -Dapc.maxalerts=8000 -w "900" >>'/var/log/process/igpswitz_france-console.log' 2>&1

30 07 * * 2-6   testuser   /get_results.sh -q -x "igp_franced" -s 'yesterday' -e 'yesterday' -m "105000" -b >>"/var/log/process/franced-console.log" 2>&1
15 12 * * 2-6   testuser   /get_results.sh -q -x "igp_franced" -s 'yesterday' -e 'yesterday' -m "105000" -w "960" >>"/var/log/process/franced-console.log" 2>&1

Tried syntax

import re
with open ("test2") as file:
        for line in file:
                try:
                        m=re.search('(?<=\-x (\"|\'))(\w+)',line)
                        print m.group(1)
                except:
                        m = None

Expected output

igp_srm_m
igp_france
igpswitz_france
igp_franced

Received Output

'
'
"
"
"
"
"
"
"
"

Unsure what is going wrong, because when I tried only for double quotes it is working correctly.

Working script only for double quotes

import re
with open ("test2") as file:
        for line in file:
                try:
                        m = re.search('(?<=\-x \")(\w*)', line)
                        print m.group(1)
                except:
                        m = None

Received Output - Search for double quotes only

igp_france
igp_france
igp_france
igp_france
igpswitz_france
igpswitz_france
igp_franced
igp_franced

Solution

  • You can use a set to get the unique values.

    In your pattern, the values are in group 2, but you can optimize the pattern a bit. the single and double quote can be used in a character class (["']) and captured in group 1. Then you can use a backreference to pair up the matched quote using \

    -x (["'])(\w+)\1
    

    Regex demo | Python demo

    import re
    
    result = set()
    
    with open ("test2") as file:
        for line in file:
            try:
                m = re.search(r"-x ([\"'])(\w+)\1", line)
                result.add(m.group(2))
            except:
                m = None
    
    print(result)
    

    Output

    {'igp_france', 'igp_srm_m', 'igp_franced', 'igpswitz_france'}