cloudant-spark connector creates duplicate column name with nested JSON schema

I'm using the following JSON Schema in my cloudant database:

{...
 departureWeather:{
    temp:30,
    otherfields:xyz
 },
 arrivalWeather:{
    temp:45,
    otherfields: abc
 }
 ...
}

I'm then loading the data into a dataframe using the cloudant-spark connector. If I try to select fields like so:

df.select("departureWeather.temp", "arrivalWeather.temp")

I end up with a dataframe that has 2 columns with the same name e.g. temp. It looks like Spark datasource framework is flattening the name using only the last part.

Is there an easy to deduplicate the column names?

Solution

You can use aliases:

df.select(
    col("departureWeather.temp").alias("departure_temp"),
    col("arrivalWeather.temp").alias("arrival_temp")
)

Spark: Trying to run spark-shell, but get 'cmd' is not recognized as an internal or
Remove list elements in a dataframe in scala
Not able to Explode and select in the same expression in spark scala
Fetching data from REST API to Spark Dataframe using Pyspark
Create column using Spark pandas_udf, with dynamic number of input columns
How to find position of substring column in another column using PySpark?
How to correctly read a CSV file while escaping delimiter comma placed within square brackets using Apache Spark and Scala?
SPARK SQL Equivalent of Qualify + Row_number statements
How to drop a column from a Databricks Delta table?
Converting all columns in spark df from decimal to float for pandas conversion
How to create a copy of a dataframe in pyspark?
Read previous Spark APIs
Unexpected output from least (source data includes nulls)
How to use PySpark UDF in Java / Scala Spark project
How does spark load python package depends on the external library?
Disable PySpark to print info when running
PySpark: How To Deserialise A Proto Payload From A Kafka Message With Variable Message Type
Multiple Sinks Processing not persisting in Databricks Community Edition
How to find longest sequence of consecutive dates?
graph.triplets seems not work as expected
PySpark MongoDB :: java.lang.NoClassDefFoundError: com/mongodb/client/model/Collation
How do I access the fields within a VARIANT column while reading from Kafka using Spark?
pyspark: how to specify rebalance partitioning hint with columns
Is Python UDF still inefficient in Spark?
How to import AnalysisException in PySpark
Updated scalapb class fails to render old dataframe
Create a Column with Values Based on an Array of Column Names Provided in Another Column
How to join on multiple columns in Pyspark?
Databricks: Issue while creating spark data frame from pandas
How to use SparkSQLparse in a simple FROM analysis?