I am creating an Azure Synapse pipeline that is utilizing a data flow which is attempting to transform a pretty complex xml file into a parquet file. I am doing this by flattening the xml from different points so that I can export each array into its own parquet file. This works fine when I am directly pointing to an XML file as the source, but I need my data flow to be parameterized to accept different XML files (of the same structure, just different values for the fields). When I do parameterize the file, my flattening task does not work as the mapping I had when I pointed to a specific file can no longer be found or used.
Here is an example of how a portion of my XML source projection may look when I point to one file
Error when trying to unroll when not pointing to a specific file (parameterized)
I read that the way to fix this is to use dynamic content for unrolling and expression builder for the columns, but I'm having a hard time figuring out how to do either. Any expression I try ends up bringing up an error of either "The column could not be found" and/or "Dot operator should be used for the hierarchal type." Online there are quite a few examples of this being done for JSON, but I have yet to find one for XML. Does anyone have an example they could share or point out something that I am missing?
Thank you!
Currently, you cannot flatten the files of different structure with same dataflow dynamically. If the files have different structure, it will give the error of schema mismatch. If the source data has same structure, then the flatten works for all kinds of input xml data.
To achieve your requirement, you need to build your own logic in any programming language in synapse or databricks notebooks as mentioned in your comment.