Search code examples
pythonrml

Get individual strings between brackets


Let's say I have this string

[LEVEL]
    [NAME]The Girder Guide! [/NAME]
    [AUTHOR]draworigami[/AUTHOR]
    [AUTHORLEVEL]11[/AUTHORLEVEL]
    [COUNTRY]CA[/COUNTRY]
    [ID]62784[/ID]
    [RATING]4[/RATING]
    [DATE]2021-05-11 23:08:35[/DATE]
    [PLAYCOUNT]33[/PLAYCOUNT]
    [WINCOUNT]28[/WINCOUNT]
    [STARS]0[/STARS]
    [COMMENTS]1[/COMMENTS]
[/LEVEL]

Is there a way I can get the individual strings between each [] and [/]? I've kept taking shots at it with code from the internet to no avail.


Solution

  • This will return all the text between [] and [/]:

    from bs4 import BeautifulSoup
    
    rml = """
    [LEVEL]
        [NAME]The Girder Guide! [/NAME]
        [AUTHOR]draworigami[/AUTHOR]
        [AUTHORLEVEL]11[/AUTHORLEVEL]
        [COUNTRY]CA[/COUNTRY]
        [ID]62784[/ID]
        [RATING]4[/RATING]
        [DATE]2021-05-11 23:08:35[/DATE]
        [PLAYCOUNT]33[/PLAYCOUNT]
        [WINCOUNT]28[/WINCOUNT]
        [STARS]0[/STARS]
        [COMMENTS]1[/COMMENTS]
    [/LEVEL]
    """
    
    html = rml.replace('[', '<').replace(']', '>')
    soup = BeautifulSoup(html, 'html.parser')
    print(soup.find('level').text)
    

    Output:

    The Girder Guide! 
    draworigami
    11
    CA
    62784
    4
    2021-05-11 23:08:35
    33
    28
    0
    1
    

    Edit #1: The original string does not have newlines, so to pretty print:

    rml = "[LEVEL][NAME]The Girder Guide![/NAME][AUTHOR]draworigami[/AUTHOR][AUTHORLEVEL]11[/AUTHORLEVEL][COUNTRY]CA[/COUNTRY][ID]62784[/ID][RATING]4[/RATING][DATE]2021-05-11 23:08:35[/DATE][PLAYCOUNT]33[/PLAYCOUNT][WINCOUNT]28[/WINCOUNT][STARS]0[/STARS][COMMENTS]1[/COMMENTS][/LEVEL]"
    
    html = rml.replace('[', '<').replace(']', '>')
    soup = BeautifulSoup(html, 'html.parser')
    elements = soup.find('level').contents
    for e in elements:
        print(e.text)