My YAML database:
left:
- title: Active Indicative
fill: "#cb202c"
groups:
- "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"
My Python code:
import io
import yaml
with open("C:/Users/colin/Desktop/LBot/latin3_2.yaml", 'r', encoding="utf8") as f:
doc = yaml.safe_load(f)
txt = doc["left"][1]["groups"][1]
print(txt)
Currently my output is Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]
but I would like the output to be ō
, is
, it
, or imus
. Is this possible in PyYaml and if so how would I implement it? Thanks in advance.
I don't have a PyYaml solution, but if you already have the string from the YAML file, you can use Python's regex
module to extract the text inside the [ ]
.
import re
txt = "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"
parts = txt.split(" | ")
print(parts)
# ['Present', 'dūc[ō]', 'dūc[is]', 'dūc[it]', 'dūc[imus]', 'dūc[itis]', 'dūc[unt]']
pattern = re.compile("\\[(.*?)\\]")
output = []
for part in parts:
match = pattern.search(part)
if match:
# group(0) is the matched part, ex. [ō]
# group(1) is the text inside the (.*?), ex. ō
output.append(match.group(1))
else:
output.append(part)
print(" | ".join(output))
# Present | ō | is | it | imus | itis | unt
The code first splits the text into individual parts, then loops through each part search
-ing for the pattern [x]
. If it finds it, it extracts the text inside the brackets from the match object and stores it in a list. If the part
does not match the pattern (ex. 'Present'
), it just adds it as is.
At the end, all the extracted strings are join
-ed together to re-build the string without the brackets.
EDIT based on comment:
If you just need one of the strings inside the [ ]
, you can use the same regex pattern but use the findall
method instead on the entire txt
, which will return a list
of matching strings in the same order that they were found.
import re
txt = "Present | dūc[ō] | dūc[is] | dūc[it] | dūc[imus] | dūc[itis] | dūc[unt]"
pattern = re.compile("\\[(.*?)\\]")
matches = pattern.findall(txt)
print(matches)
# ['ō', 'is', 'it', 'imus', 'itis', 'unt']
Then it's just a matter of using some variable to select an item from the list:
selected_idx = 1 # 0-based indexing so this means the 2nd character
print(matches[selected_idx])
# is