Search code examples
pythonregexpython-re

Match a patern with multiple entries in arbitrary order in Python with re


I try to catch values entered in syntax like this one name="Game Title" authors="John Doe" studios="Studio A,Studio B" licence=ABC123 url=https://example.com command="start game" type=action code=xyz78

But name, author, studio, …, code statements could appear in arbitrary and different order than the previous one.

For the moment here is my code:

import re

input_string = 'name="Game Title" authors="John Doe" studios="Studio A,Studio B" licence=ABC123 url=https://example.com command="start game" type=action code=xyz789'

ADD_GAME_PATERN = r'(?P<name>(?:"[^"]*"|\'[^\']*\'|[^"\']*))\s+' \
    r'licence=(?P<licence>[a-z0-9]*)\s+' \
    r'type=(?P<typeCode>[a-z0-9]*)\s+' \
    r'command=(?P<command>(?:"[^"]*"|\'[^\']*\'|[^"\']*))\s+' \
    r'url=(?P<url>\S+)\s+' \
    r'code=(?P<code>[a-z0-9]*)\s+' \
    r'studios=(?P<studios>.*)\s+' \
    r'authors=(?P<authors>.*)\s+'

match = re.match(ADD_GAME_PATERN, input_string)

if match:
    name = match.group('name')
    code = match.group('code')
    licence = match.group('licence')
    type_code = match.group('typeCode')
    command = match.group('command')
    url = match.group('url')
    studios = match.group('studios')
    authors = match.group('authors')

    print(f"Name: {name}")
    print(f"Code: {code}")
    print(f"Licence: {licence}")
    print(f"Type: {type_code}")
    print(f"Command: {command}")
    print(f"URL: {url}")
    print(f"Studios: {studios}")
    print(f"Authors: {authors}")
else:
    print("No correspondance founded.")

But in his current state the pattern await for the exact order of the statements.

So how to allow different and arbitrary order of statements?


Solution

  • I'd use a more simple pattern, and code the rest:

    ([^=]+)=([^\s"]+)|([^=]+)="([^"]+)"
    

    import re
    
    s = 'name="Game Title" authors="John Doe" studios="Studio A,Studio B" licence=ABC123 url=https://example.com command="start game" type=action code=xyz789'
    
    p = r'([^=]+)=([^\s"]+)|([^=]+)="([^"]+)"'
    
    print(re.findall(p, s))
    
    

    Prints

    [('', '', 'name', 'Game Title'), ('', '', ' authors', 'John Doe'), ('', '', ' studios', 'Studio A,Studio B'), (' licence', 'ABC123', '', ''), (' url', 'https://example.com', '', ''), ('', '', ' command', 'start game'), (' type', 'action', '', ''), (' code', 'xyz789', '', '')]
    
    

    Notes:

    There are two types of values, for which we define four capture groups, two groups for each key and value:

    • ([^=]+)=([^\s"]+): capture group 1 and 2 for the first key and first value.
    • |: or
    • ([^=]+)="([^"]+)": capture group 3 and 4 for the second key and second value.