I want to read multiple parquet files(S3 source) with different schemas into a Glue DynamicFrame. I am unable to merge the schemas of the files. The first file's header is interpreted as the schema of the DynamicFrame. Only the columns that are matching with first file's are read from the other files. The rest of the columns are dropped. Is there a way to merge all the headers into a single one?
from awsglue.context import GlueContext
inputGDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://bucket_name/"],"useS3ListImplementation":True, "recurse": True}, format = "parquet", transformation_ctx="inputGDF")
Just convert the DynamicFrame to a Spark dataframe with the toDF() function.