Search code examples
scalaapache-sparkspark-streamingspark-structured-streaming

How can I add several columns (still not populated) to the DataFrame in Spark Structured Streaming


I have a Kafka stream with the standart Kafka schema. I'd like to add a bunch of columns to make this stream being possible to union. I'd like to reuse schema variable

val schema = StructType(
    StructField("id", LongType, nullable = false) ::
      StructField("Energy Data", StringType, nullable = false) ::
      StructField("Distance", StringType, nullable = false) ::
      StructField("Humidity", StringType, nullable = false) ::
      StructField("Ambient Temperature", StringType, nullable = false) ::
      StructField("Cold Water Temperature", StringType, nullable = false) ::
      StructField("Vibration Value 1", StringType, nullable = false) ::
      StructField("Vibration Value 2", StringType, nullable = false) ::
      StructField("Handle Movement", StringType, nullable = false) ::
      StructField("Make Coffee", StringType, nullable = false) ::
      Nil)

Is there something like

.withColumns(schema)

not to duplicate the structure, but to reuse the same schema as the source of list of the columns to be added?

UPD:

val iter=schema.iterator
    while(iter.hasNext)
      {
        controlDataFrame=controlDataFrame.withColumn(iter.next.name,lit(""))
      }

worked for me


Solution

  • Maybe you could try something like:

    xs.withColumn("y", lit(null).cast(StringType))
    

    to add empty columns. You could get the schema then from xs.schema but I am not sure if this solves your problem if you want to reuse the original variable.