Search code examples
apache-sparkpysparkapache-spark-sqldatabricks

AnalysisException: Found duplicate column(s) in the data to save


I'm trying to insert a dataframe's values inside an SQL table on Databricks.

The thing is, there is no (apparent) duplicate columns in the dataframe. I checked. What this could be?

 |-- nr_cpf_cnpj: string (nullable = true)
 |-- tp_pess: string (nullable = true)
 |-- am_bacen: long (nullable = true)
 |-- cd_moda: long (nullable = true)
 |-- cd_sub_moda: long (nullable = true)
 |-- vl_bacen: decimal(29,2) (nullable = true)
 |-- clivenc: string (nullable = true)
 |-- vl_envio: decimal(28,2) (nullable = true)
 |-- nm_pess_empr: string (nullable = true)
 |-- nr_cnae_prin: long (nullable = true)


spark.sql("INSERT INTO TABLE db.tb_jul_bcn  SELECT * FROM tmpBcnView")

The dataframe is in the tmpBcnViewas a temp view

Error:

AnalysisException: Found duplicate column(s) in the data to save: nr_cnae_prin
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-2987275027841731> in <cell line: 1>()
----> 1 spark.sql("INSERT INTO TABLE db.tb_jul_bcn  SELECT * FROM tmpBcnView")
/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
     46             start = time.perf_counter()
     47             try:
---> 48                 res = func(*args, **kwargs)
     49                 logger.log_success(
     50                     module_name, class_name, function_name, time.perf_counter() - start, signature
/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery, **kwargs)
   1117             sqlQuery = formatter.format(sqlQuery, **kwargs)
   1118         try:
-> 1119             return DataFrame(self._jsparkSession.sql(sqlQuery), self)
   1120         finally:
   1121             if len(kwargs) > 0:

Solution

  • I solved!

    It happens that instead of having duplicate columns, there was an excedent column. The error was claiming a duplicate so I was searching the duplicate. That's just it!