I'm using the following JSON Schema in my cloudant database:
{...
departureWeather:{
temp:30,
otherfields:xyz
},
arrivalWeather:{
temp:45,
otherfields: abc
}
...
}
I'm then loading the data into a dataframe using the cloudant-spark connector. If I try to select fields like so:
df.select("departureWeather.temp", "arrivalWeather.temp")
I end up with a dataframe that has 2 columns with the same name e.g. temp. It looks like Spark datasource framework is flattening the name using only the last part.
Is there an easy to deduplicate the column names?
You can use aliases:
df.select(
col("departureWeather.temp").alias("departure_temp"),
col("arrivalWeather.temp").alias("arrival_temp")
)