I'm trying to insert a dataframe's values inside an SQL table on Databricks.
The thing is, there is no (apparent) duplicate columns in the dataframe. I checked. What this could be?
|-- nr_cpf_cnpj: string (nullable = true)
|-- tp_pess: string (nullable = true)
|-- am_bacen: long (nullable = true)
|-- cd_moda: long (nullable = true)
|-- cd_sub_moda: long (nullable = true)
|-- vl_bacen: decimal(29,2) (nullable = true)
|-- clivenc: string (nullable = true)
|-- vl_envio: decimal(28,2) (nullable = true)
|-- nm_pess_empr: string (nullable = true)
|-- nr_cnae_prin: long (nullable = true)
spark.sql("INSERT INTO TABLE db.tb_jul_bcn SELECT * FROM tmpBcnView")
The dataframe is in the tmpBcnViewas a temp view
Error:
AnalysisException: Found duplicate column(s) in the data to save: nr_cnae_prin
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
<command-2987275027841731> in <cell line: 1>()
----> 1 spark.sql("INSERT INTO TABLE db.tb_jul_bcn SELECT * FROM tmpBcnView")
/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
46 start = time.perf_counter()
47 try:
---> 48 res = func(*args, **kwargs)
49 logger.log_success(
50 module_name, class_name, function_name, time.perf_counter() - start, signature
/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery, **kwargs)
1117 sqlQuery = formatter.format(sqlQuery, **kwargs)
1118 try:
-> 1119 return DataFrame(self._jsparkSession.sql(sqlQuery), self)
1120 finally:
1121 if len(kwargs) > 0:
I solved!
It happens that instead of having duplicate columns, there was an excedent column. The error was claiming a duplicate so I was searching the duplicate. That's just it!