Search code examples
pythonpython-3.xstringpython-re

Get data from a string


I have a string that contains player 1's nickname, player 1's profile link, player 2's nickname, player 2's profile link, weapons. I need to get this data from the line below

[ex.endermen](profile_url1) got killed by [og.[(Z)]arcus(0.43k/d)](profile_url2) (FN Evolys, 56.83m)

I need to get the data and output it in the format

print(f"name1 = {name1}") print(f"name1_url = {name1_url}") print(f"name2 = {name2}") print(f"name2_url = {name2_url}")

is there a way to get all the information correctly, provided that the string can change to this

[ex.endermen](profile_url1) got killed by [og.[(Z)]arcus(0.43k/d)](profile_url2) (FN Evolys, 56.83m)

or [ex.endermen](profile_url1) got killed by [platina](profile_url2) (FN Evolys, 56.83m) ?

I tried to do it in multiple ways using regular expressions, but I got the same result which did not satisfy me

 name_weapon_pattern = r"\[(.*?)\].*\((.*?)\).*\((.*),.*\)"
 name_weapon_match = re.search(name_weapon_pattern, string)

 url_pattern = r"\((.*?)\)"
 url_match = re.findall(url_pattern, string)

 name1 = name_weapon_match.group(1)
 name1_url = name_weapon_match.group(2)
 name2 = name_weapon_match.group(3)
 name2_url = url_match[1]

 print(f"name1 = {name1}")
 print(f"name1_url = {name1_url}")
 print(f"name2 = {name2}")
 print(f"name2_url = {name2_url}")

result:

name1 = og.[(Z)
name1_url = name2_url
name2 = FN Evolys
name2_url = 0.43k/d

Solution

  • I agree with @shadowranger that regex is probably not the optimal tool for this use case. But anyway there is my solution using named_groups:

    import re
    
    p = re.compile(r"\[(?P<name1>.*?)\]\((?P<name1_url>.*?)\).*?\[(?P<name2>.*?)\]\((?P<name2_url>.*?)\)\s\((?P<weapons>.*)\)$")
    
    text = '[ex.endermen](profile_url1) got killed by [og.[(Z)]arcus(0.43k/d)](profile_url2) (FN Evolys, 56.83m)'
    
    p.match(text).groupdict()
    
    # outputs
    # {'name1': 'ex.endermen',
    #  'name1_url': 'profile_url1',
    #  'name2': 'og.[(Z)]arcus(0.43k/d)',
    #  'name2_url': 'profile_url2',
    #  'weapons': 'FN Evolys, 56.83m'}
    
    text = '[ex.endermen](profile_url1) got killed by [platina](profile_url2) (FN Evolys, 56.83m)'
    
    p.match(text).groupdict()
    
    # outputs
    # {'name1': 'ex.endermen',
    #  'name1_url': 'profile_url1',
    #  'name2': 'platina',
    #  'name2_url': 'profile_url2',
    #  'weapons': 'FN Evolys, 56.83m'}
    
    # you can access individual groups with
    m = p.match(text)
    m["name1"]
    # > 'ex.endermen'
    

    The problem with nested brackets can be solved using recursive pattern (which is not supported in re, you would need to use regex), but in this case we can find the name2 by matching ]( characters. This obviously breaks when the name2 contains this sequence.