Search code examples
scalaapache-sparkdataframeapache-spark-ml

Spark Scala Dynamic column selection from DataFrame


I have a DataFrame which have different type of columns. Among those column, i need to retrieve specific column from that DataFrame. Hard coded DataFrame select statement will be like this:

val logRegrDF = myDF.select(myDF("LEBEL_COLUMN").as("label"),
col("FEATURE_COL1"), col("FEATURE_COL2"), col("FEATURE_COL3"), col("FEATURE_COL4"))

Where LEBEL_COLUMN and FEATURE_COLs will be dynamic. I have Array or Seq for those FEATURE Columns like this:

val FEATURE_COL_ARR = Array("FEATURE_COL1","FEATURE_COL2","FEATURE_COL3","FEATURE_COL4")

I need to use this Array of column collection with that SELECT statement in the 2nd part. In the select, 1st column will be one (LABEL_COLUMN) and rest will be dynamic list.

Can you please help me to make the select statement working in SCALA.

Note: The sample code given bellow is working, but i need to add column array in the 2nd part of the SELECT

val colNames = FEATURE_COL_ARR.map(name => col(name))
val logRegrDF = myDF.select(colNames:_*)  // it is not the requirement

I am thinking for 2nd part code will be like this, but it is not working:

val logRegrDF = myDF.select(myDF("LEBEL_COLUMN").as("label"), colNames:_*)

Solution

  • If I understand your question, I hope this is what you are looking for

    val allColumnsArr = "LEBEL_COLUMN" +: FEATURE_COL_ARR
    result.select("LEBEL_COLUMN", allColumnsArr: _*)
      .withColumnRenamed("LEBEL_COLUMN", "label")
    

    Hope this helps!