Search code examples
pythonxmlpandasdataframef-string

when converting XML to SEVERAL dataframes, how to name these dfs in a dynamic way?


my code is on the bottom

"parse_xml" function can transfer a xml file to a df, for example, "df=parse_XML("example.xml", lst_level2_tags)" works but as I want to save to several dfs so I want to have names like df_ first_level_tag, etc

when I run the bottom code, I get an error "f'df_{first_level_tag}'=parse_XML("example.xml", lst_level2_tags) ^ SyntaxError: can't assign to literal"

I also tried .format method instead of f-string but it also hasn't worked there are at least 30 dfs to save and I don't want to do it one by one. always succeeded with f-string in Python outside pandas though

Is the problem here about f-string/format method or my code has other logic problem?

if necessary for you, the parse_xml function is directly from this link the function definition

for first_level_tag in first_level_tags:
    lst_level2_tags = []
    for subchild in root[0]:
        lst_level2_tags.append(subchild.tag)
    f'df_{first_level_tag}'=parse_XML("example.xml", lst_level2_tags) 

Solution

  • This seems like a situation where you'd be best served by putting them into a dictionary:

    dfs = {}
    for first_level_tag in first_level_tags:
        lst_level2_tags = []
        for subchild in root[0]:
            lst_level2_tags.append(subchild.tag)
        dfs[first_level_tag] = parse_XML("example.xml", lst_level2_tags)
    

    There's nothing structurally wrong with your f-string, but you generally can't get dynamic variable names in Python without doing ugly things. In general, storing the values in a dictionary ends up being a much cleaner solution when you want something like that.

    One advantage of working with them this way is that you can then just iterate over the dictionary later on if you want to do something to each of them. For example, if you wanted to write each of them to disk as a CSV with a name matching the tag, you could do something like:

    for key, df in dfs.items():
        df.to_csv(f'{key}.csv')
    

    You can also just refer to them individually (so if there was a tag named a, you could refer to dfs['a'] to access it in your code later).