Search code examples
python-3.xpysparkaws-glue

Python function returns non-type in AWS Glue even the same function working in local machine


i am new to AWS glue. I have a created job that would modify phone number's from a column and update the data frame. Below script working fine in my local machine where i running with pyspark, This basically add '+00' against those phone numbers which are not starting with '0'

## Phonenubercolum
6-451-512-3627
0-512-582-3548
1-043-733-0050

def addCountry_code(phoneNo):
    countryCode= '+00'+phoneNo
    if phoneNo[:1] !='0':
        return str(countryCode)
    else:
        return str(phoneNo)

phone_replace_udf=udf(lambda x: addCountry_code(x), StringType())

phoneNo_rep_DF= concatDF.withColumn("phoneNumber", phone_replace_udf(sf.col('phoneNumber')))#.drop('phoneNumber')
##output
+006-451-512-3627
0-512-582-3548
+001-043-733-0050

But when i ran the same code in the glue context, it throws following error

addCountry_code countryCode= '+00'+phoneNo **TypeError: must be str, not NoneType**

I am wondering how this function fails in glue?

Appreciate if anyone can help on this?


Solution

  • This should give the desired result. Use spark.udf.register to register the function

    import json
    import boto3
    import pyspark.sql.dataframe
    from pyspark.sql.types import StringType
    
    ds = [{'phoneNumber': '6-451-512-3627'},
    {'phoneNumber': '0-512-582-3548'},
    {'phoneNumber': '1-043-733-0050'}]
    
    sf = spark.createDataFrame(ds)
    
    def addCountry_code(phoneNo):
        countryCode= '+00'+phoneNo
        if phoneNo[:1] !='0':
            return str(countryCode)
        else:
            return str(phoneNo)
    
    
    
    spark.udf.register('phone_replace_udf', lambda x: addCountry_code(x), StringType())
    sf.createOrReplaceTempView('sf')
    spark.sql('select phone_replace_udf(phoneNumber) from sf').collect()