Search code examples
dataframepysparkapache-spark-sqlaws-glueaws-glue-spark

Pyspark SQL dataframe map with multiple data types


I'm having a pyspark code in glue where I want to create a dataframe with map structure to be a combination of integer and string.

sample data:

{ "Candidates": [
    {
      "jobLevel": 6,
      "name": "Steven",
    },    {
      "jobLevel": 5,
      "name": "Abby",
    } ] }

Hence, I tried using the below code to create the map data type. But every time the integer data type jobLevel gets converted to string data type. Any suggestion to get this done by retaining the data type of the job level?

code used:

df = spark.sql("select Supervisor_name, 
           map('job_level', INT(job_level_name), 
          'name', employeeLogin) as Candidates 
     from dataset_1")

Solution

  • It is not possible for map values to have different types. Use a struct for this situation.

    df = spark.sql("""
        select Supervisor_name, 
               struct(INT(job_level_name) as job_level, 
                      employeeLogin as name
                     ) as Candidates 
        from dataset_1
    """)