Search code examples
apache-spark-sqlapache-spark-dataset

How to add columns subsequently in a dataset using Spark within a for loop ( where for loop contains the column name)


Here trying to add subsequently column to dataset Row, the issue coming up is last column is only visible. The columns added earlier do not persist

private static void populate(Dataset<Row> res, String[] args)
    {
        String[] propArr = args[0].split(",");   // Eg: [abc, def, ghi]       
            
        // Dataset<Row> addColToMergedData = null;
        
        /** Here each element is the name of the column to be inserted */
        for(int i = 0; i < propArr.length; i++){

            // addColToMergedData = res.withColumn(propArr[i], lit(null));
        }
    }

Solution

  • the logic in the for loop is flawed hence the issue. you can modify the program as follows :

    private static void populate(Dataset<Row> res, String[] args)
            {
                    String[] propArr = args[0].split(",");   // Eg: [abc, def, ghi]       
                   
                    Dataset<Row> addColToMergedData = null;
            
                    /** Here each element is the name of the column to be inserted */
                    for(int i = 0; i < propArr.length; i++)
                    {
                        res = res.withColumn(propArr[i], lit(null));
                    }
                    addColToMergedData  = res
    
            }