Search code examples

Convert a list of fields to structtype object which is a SparkR schema

We have to get the schema of dataframe in SparkR as StructType and list as list of fields, e.g:

#List of 2
# $ jobj  :Class 'jobj' <environment: 0x563114ff5900> 
# $ fields:function ()  
# - attr(*, "class")= chr "structType"

schema <- schema(output_count)
fields <- schema$fields()

#StructField(name = "word", type = "StringType", nullable = TRUE)
#StructField(name = "count", type = "StringType", nullable = TRUE)

I found that SparkR API exposes a method:

but not sure how to use it as a beginner in SparkR

My attempt:

schema <- schema(output_count)
#List of 2
# $ jobj  :Class 'jobj' <environment: 0x563114ff5900> 
# $ fields:function ()  
# - attr(*, "class")= chr "structType"

I try to get it as a structtype


  • If I understood correctly, then the below codes at least produces the type of output you explained in the question.

    df <- SparkR::createDataFrame(iris)
    lapply(SparkR::dtypes(df), function(x) SparkR::structField(x[1], x[2]))

    The output is:

    StructField(name = "Sepal_Length", type = "DoubleType", nullable = TRUE)
    StructField(name = "Sepal_Width", type = "DoubleType", nullable = TRUE)
    StructField(name = "Petal_Length", type = "DoubleType", nullable = TRUE)
    StructField(name = "Petal_Width", type = "DoubleType", nullable = TRUE)
    StructField(name = "Species", type = "StringType", nullable = TRUE)

    If you further use do.apply with SparkR::structType,, lapply(SparkR::dtypes(dd), function(x) SparkR::structField(x[1], x[2])))

    then the output is like below:

    |-name = "Sepal_Length", type = "DoubleType", nullable = TRUE
    |-name = "Sepal_Width", type = "DoubleType", nullable = TRUE
    |-name = "Petal_Length", type = "DoubleType", nullable = TRUE
    |-name = "Petal_Width", type = "DoubleType", nullable = TRUE
    |-name = "Species", type = "StringType", nullable = TRUE