I have an input in yaml with various levels of nested objects. I need a python function to go over it all and get the desired output - list of strings where each field is separated by dot if its nested - Object1.Object2.Object3.Object4... Examples below.
I am trying to achieve it with a recursive function. My code snippet:
tests = []
test2 = {}
def test(config, parent=None):
previous_parent = None
names = []
for column in config:
if column.get("dtype") in ["array", "struct"]:
parent = column["name"]
print(f"parent: {parent}")
test(column["columns"], parent)
else:
value = column["name"]
print(f"value: {value}")
# names.append(value)
And the output is:
value: PartitionDate
value: TransactionID
value: EventTimestamp
parent: ControlTransaction
value: StoreID
parent: RetailTransaction
value: StoreID
value: WorkstationID
...
Input:
columns:
- name: PartitionDate
- name: TransactionID
- name: EventTimestamp
- name: ControlTransaction
dtype: struct
columns:
- name: StoreID
- name: WorkstationID
- name: Transaction
dtype: struct
columns:
- name: TransactionID
- name: TransactionNumber
- name: ControlType
- name: RetailTransaction
dtype: struct
columns:
- name: StoreID
- name: WorkstationID
Output:
[
PartitionDate,
TransactionID,
EventTimestamp,
ControlTransaction.StoreID,
ControlTransaction.WorkstationID,
ControlTransaction.Transaction.TransactionID,
ControlTransaction.TransactionNumber,
ControlType,
RetailTransaction.StoreID,
RetailTransaction.WorkstationID
]
Just a few changes:
parent=None
parameter with parents=[]
to provide a complete list of parent names."columns"
:
parent
list.names
."columns"
: combine its name with parents
and join
this list with a .
separator.names
.import yaml
def test(config, parents=[]):
names = []
for column in config:
if column.get("dtype") in ["array", "struct"] and "columns" in column:
cur_parents = parents.copy()
cur_parents.append(column["name"])
children = test(column["columns"], cur_parents)
names.extend(children)
else:
value = column["name"]
value_path = parents + [value]
names.append(".".join(value_path))
return names
with open("input.yaml", "r") as inp:
yaml_conf = yaml.safe_load(inp)
values = test(yaml_conf.get("columns"))
print("[\n{}\n]".format(",\n".join(values)))
Edit: make sure to check the important notes made by @Anthon in the comments and in another answer.