Search code examples
pythonregexpython-2.7single-quotes

Regex find content in between single quotes, but only if contains certain word


I want to get the content between single quotes, but only if it contains a certain word (i.e 'sample_2'). It additionally should not match ones with white space.

Input example: (The following should match and return only: ../sample_2/file and sample_2/file)

['asdf', '../sample_2/file', 'sample_2/file', 'example with space', sample_2, sample]

Right now I just have that matched the first 3 items in the list:

'(.\S*?)' 

I can't seem to find the right regex that would return those containing the word 'sample_2'


Solution

  • If you want specific words/characters you need to have them in the regular expression and not use the '\S'. The \S is the equivalent to [^\r\n\t\f\v ] or "any non-whitespace character".

    import re
    
    teststr = "['asdf', '../sample_2/file', 'sample_2/file', 'sample_2 with spaces','example with space', sample_2, sample]"
    matches = re.findall(r"'([^\s']*sample_2[^\s]*?)',", teststr)
    # ['../sample_2/file', 'sample_2/file']
    

    Based on your wording, you suggest the desired word can change. In that case, I would recommend using re.compile() to dynamically create a string which then defines the regular expression.

    import re
    word = 'sample_2'
    teststr = "['asdf', '../sample_2/file', 'sample_2/file', ' sample_2 with spaces','example with space', sample_2, sample]"
    
    regex = re.compile("'([^'\\s]*"+word+"[^\\s]*?)',")
    matches = regex.findall(teststr)
    # ['../sample_2/file', 'sample_2/file']
    

    Also if you haven't heard of this tool yet, check out regex101.com. I always build my regular expressions here to make sure I get them correct. It gives you the references, explanation of what is happening and even lets you test it right there in the browser.

    Explanation of regex

    regex = r"'([^\s']*sample_2[^\s]*?)',"
    

    Find first apostrophe, start group capture. Capture anything except a whitespace character or the corresponding ending apostrophe. It must see the letters "sample_2" before accepting any non-whitespace character. Stop group capture when you see the closing apostrophe and a comma.

    Note: In python, a string " or ' prepositioned with the character 'r' means the text is compiled as a regular expression. Strings with the character 'r' also do not require double-escape '\' characters.