Search code examples
pythonpyspark

Adding multiple column using for loop in pyspark


I need to add a number of columns (2) into the data frame in pyspark. I am using the select statement:

df.select("*",[sha2(c,256).alias("hashed_"+c) for c in f_pseudo_cols])

Here I am selecting all columns and also I'm adding another 2 columns with the sha2() function.

In f_pseudo_cols I have 2 cols named "Swis code" and "Roll year" which is present in df(dataframe).

I am getting following error:

'Invalid argument, not a string or column: Row(field_name='Swis Code') of type . For column literals, use 'lit', 'array', 'struct' or 'create_map' function.'

I tried to convert it to String using the str() function which is also not working.


Solution

  • This should work.

    import sys
    
    from pyspark.sql import Row
    from pyspark.sql.types import *
    from pyspark import SparkContext, SQLContext
    import pyspark.sql.functions as F
    from pyspark.sql import Window
    
    from pyspark import SparkContext, SQLContext
    
    
    
    
    sc = SparkContext('local')
    sqlContext = SQLContext(sc)
    
    data1 = [
        (10087, "BH", "L", "D"),
        (10066, "BS", "B", "null"),
        (10094, "BL", "L", "E"),
        (10080, "BF", "B", "null")
         ]
    
    df1Columns = ["ID","CODE","TYP","KIND"]
    df1 = sqlContext.createDataFrame(data=data1, schema = df1Columns)
    
    
    hexed_df1 =  df1.select([F.sha2(F.lit(c),256).alias("hashed_"+c) for c in df1Columns])
    
    print("hexed_df1 dataframe")
    hexed_df1.show(truncate=False)
    

    Output :

    hexed_df1 dataframe
    +----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+
    |hashed_ID                                                       |hashed_CODE                                                     |hashed_TYP                                                      |hashed_KIND                                                     |
    +----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+
    |3843971dcfdee5083e6289e1bbdbb003e538b5a8a668fc43ae4f19d415ac18a2|07a9d7b4a9a23915a61bc89bb0357bf47b348cf4174eb965bb1df8fbfa18b0b5|3909713b1e306608da35d4e3b7d0a72cbe7bee7f99c041f134a233740a4e8ccd|ae97bccd529278e7c12624025e56b3034e5afca568f579f6ef5e04f900fef2bb|
    |3843971dcfdee5083e6289e1bbdbb003e538b5a8a668fc43ae4f19d415ac18a2|07a9d7b4a9a23915a61bc89bb0357bf47b348cf4174eb965bb1df8fbfa18b0b5|3909713b1e306608da35d4e3b7d0a72cbe7bee7f99c041f134a233740a4e8ccd|ae97bccd529278e7c12624025e56b3034e5afca568f579f6ef5e04f900fef2bb|
    |3843971dcfdee5083e6289e1bbdbb003e538b5a8a668fc43ae4f19d415ac18a2|07a9d7b4a9a23915a61bc89bb0357bf47b348cf4174eb965bb1df8fbfa18b0b5|3909713b1e306608da35d4e3b7d0a72cbe7bee7f99c041f134a233740a4e8ccd|ae97bccd529278e7c12624025e56b3034e5afca568f579f6ef5e04f900fef2bb|
    |3843971dcfdee5083e6289e1bbdbb003e538b5a8a668fc43ae4f19d415ac18a2|07a9d7b4a9a23915a61bc89bb0357bf47b348cf4174eb965bb1df8fbfa18b0b5|3909713b1e306608da35d4e3b7d0a72cbe7bee7f99c041f134a233740a4e8ccd|ae97bccd529278e7c12624025e56b3034e5afca568f579f6ef5e04f900fef2bb|
    +----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+----------------------------------------------------------------+