hive pyspark apache-spark-sql amazon-emr

Cannot have map type columns in DataFrame which calls set operations

: org.apache.spark.sql.AnalysisException: Cannot have map type columns in DataFrame which calls set operations(intersect, except, etc.), but the type of column map_col is map

I have a hive table with a column of type - MAP<Float, Float>. I get the above error when I try to do an insertion on this table in a spark context. Insertion works fine without the 'distinct'.

create table test_insert2(`test_col` string, `map_col` MAP<INT,INT>) 
location 's3://mybucket/test_insert2';

insert into test_insert2 
select distinct 'a' as test_col, map(0,0) as map_col

Solution

Try to convert dataframe to .rdd then apply .distinct function.

Example:

spark.sql("select 'a'test_col,map(0,0)map_col 
              union all 
          select 'a'test_col,map(0,0)map_col").rdd.distinct.collect

Result:

Array[org.apache.spark.sql.Row] = Array([a,Map(0 -> 0)])

Extracting a number of X length from column where the X criteria is stored in another column
how to convert date 2017-sep-12 To 2017-09-12 in HIVE
How to make MSCK REPAIR TABLE execute automatically in AWS Athena
Adding another groupby in Hive
How to iterate through column A to sum column B conditionally?
Is defining a delimiter in a hive ORC Table useless?
Unable to exit Hive
How to delete and update a record in Hive
Unable to access hiveserver2 via beeline
Creating deciles in SQL
Keeping the order of records in Hive collect
How to cache dataframes that will be retained after spark session terminates
Writing SQL vs using Dataframe APIs in Spark SQL
Athena unable to parse date using OpenCSVSerde
java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Why does Spark not create a new file after inserting data into an external table?
How to change date format in hive?
Hive Always Fails at Mapreduce
Hql query through using YARN APPLICATION ID
How to delete data physically with Presto/Trino?
Parsing a table that has a JSON column with multiple rows per field
athena insert and hive format error for HiveIgnoreKeyTextOutputFormat
How do we know if a table is managed table or external table?
I need to calculate profit/loss for given stock data set, ensuring that the first bought items are sold first
Filtering is supported only on partition keys of type string Hive
Flutter, Hive, GetIt => Problem Registering Box
Export as csv in beeline hive
Parquet column cannot be converted in file (...) Expected decimal, Found: FIXED_LEN_BYTE_ARRAY
DuckDB insert the hive partitions into parquet file
hive -e with delimiter