Search code examples
pythonamazon-web-servicespysparkapache-spark-sqlaws-glue-spark

How to combine two columns values to another column using pyspark?


This is the code I'm using to map values from a csv to a table in sql in aws glue.

mappings=[
        ("houseA", "string", "villa", "string"),
        ("houseB", "string", "small_house", "string"),
        ("houseA"+"houseB", "string", "combined_key", "string"),
    ],

I find no issue with mapping houseA and houseB to "villa" and "small_house" columns respectively. But when I try to have houseAhouseB in "combined_key" column it is giving me this error.

An error occurred while calling o128.pyWriteDynamicFrame. Cannot insert the value NULL into column 'combined_key', table 'dbo.Buildings'; column does not allow nulls. INSERT fails.

I couldn't quite figure out why it is giving back a null error.

Any ideas on how the code can be modified?

Thanks in advance.


Solution

  • I actually had found that there is a custom transformation available in glue studio where we can achieve this using pyspark code