Search code examples
pythonyamlpyyaml

How do I print a specific part of a YAML string


My YAML database:

left:
  - title: Active Indicative
    fill: "#cb202c"
    groups:
      - "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"

My Python code:

import io
import yaml

with open("C:/Users/colin/Desktop/LBot/latin3_2.yaml", 'r', encoding="utf8") as f:
    doc = yaml.safe_load(f)
txt = doc["left"][1]["groups"][1]
print(txt)

Currently my output is Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt] but I would like the output to be ō, is, it, or imus. Is this possible in PyYaml and if so how would I implement it? Thanks in advance.


Solution

  • I don't have a PyYaml solution, but if you already have the string from the YAML file, you can use Python's regex module to extract the text inside the [ ].

    import re
    
    txt = "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"
    
    parts = txt.split(" | ")
    print(parts)  
    # ['Present', 'dūc[ō]', 'dūc[is]', 'dūc[it]', 'dūc[imus]', 'dūc[itis]', 'dūc[unt]']
    
    pattern = re.compile("\\[(.*?)\\]")
    output = []
    for part in parts:
        match = pattern.search(part)
        if match:
            # group(0) is the matched part, ex. [ō]
            # group(1) is the text inside the (.*?), ex. ō
            output.append(match.group(1))
        else:
            output.append(part)
    
    print(" | ".join(output))
    # Present | ō | is | it | imus | itis | unt
    

    The code first splits the text into individual parts, then loops through each part search-ing for the pattern [x]. If it finds it, it extracts the text inside the brackets from the match object and stores it in a list. If the part does not match the pattern (ex. 'Present'), it just adds it as is.

    At the end, all the extracted strings are join-ed together to re-build the string without the brackets.


    EDIT based on comment:

    If you just need one of the strings inside the [ ], you can use the same regex pattern but use the findall method instead on the entire txt, which will return a list of matching strings in the same order that they were found.

    import re
    
    txt = "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"
    
    pattern = re.compile("\\[(.*?)\\]")
    matches = pattern.findall(txt)
    print(matches) 
    # ['ō', 'is', 'it', 'imus', 'itis', 'unt']
    

    Then it's just a matter of using some variable to select an item from the list:

    selected_idx = 1  # 0-based indexing so this means the 2nd character
    print(matches[selected_idx])
    # is