I have contents of yaml file as a string. I want to look through these and fetch any occurrence of a list named users
yaml_file = base64.b64decode(encoded_data).decode('utf-8')
users = []
# fetch users from yaml_file
The list can appear anywhere in the files, such as parent element or a child of any degree, hence reading the file as yaml and parsing it won’t be useful since there is no one structure.
...
users:
- user1
- user2
- user3
...
Is there a regex that I can use to fetch only the list of name users
from a yaml string?
Parsing structured data with regex is almost always going to result in unreliable behaviors.
You can instead use a function that traverses sub-dicts and sub-lists recursively until a sub-list under a key with a specified name is found:
def find_named_list(data, name):
if isinstance(data, dict):
for key, value in data.items():
if key == name and isinstance(value, list):
return value
if (lst := find_named_list(value, name)) is not None:
return lst
elif isinstance(data, list):
for value in data:
if (lst := find_named_list(value, name)) is not None:
return lst
so that:
import yaml
data = '''---
foo:
- hello: 1
world:
users:
- user1
- user2
- user3
- stack: overflow
bar: ''
'''
print(find_named_list(yaml.safe_load(data), 'users'))
outputs:
['user1', 'user2', 'user3']
Demo: https://ideone.com/Mdjixm
To find all sub-lists under a key with a specified name, create a generator using the yield
statement instead:
def find_named_list(data, name):
if isinstance(data, dict):
for key, value in data.items():
if key == name and isinstance(value, list):
yield value
yield from find_named_list(value, name)
elif isinstance(data, list):
for value in data:
yield from find_named_list(value, name)
so that:
data = '''---
foo:
- hello: 1
world:
users:
- user1
- user2
- user3
- stack: overflow
bar:
users:
- user_a
- user_b
- user_c
'''
print(list(find_named_list(yaml.safe_load(data), 'users')))
outputs:
[['user1', 'user2', 'user3'], ['user_a', 'user_b', 'user_c']]