i am new to AWS glue. I have a created job that would modify phone number's from a column and update the data frame. Below script working fine in my local machine where i running with pyspark, This basically add '+00' against those phone numbers which are not starting with '0'
## Phonenubercolum
6-451-512-3627
0-512-582-3548
1-043-733-0050
def addCountry_code(phoneNo):
countryCode= '+00'+phoneNo
if phoneNo[:1] !='0':
return str(countryCode)
else:
return str(phoneNo)
phone_replace_udf=udf(lambda x: addCountry_code(x), StringType())
phoneNo_rep_DF= concatDF.withColumn("phoneNumber", phone_replace_udf(sf.col('phoneNumber')))#.drop('phoneNumber')
##output
+006-451-512-3627
0-512-582-3548
+001-043-733-0050
But when i ran the same code in the glue context, it throws following error
addCountry_code countryCode= '+00'+phoneNo **TypeError: must be str, not NoneType**
I am wondering how this function fails in glue?
Appreciate if anyone can help on this?
This should give the desired result. Use spark.udf.register to register the function
import json
import boto3
import pyspark.sql.dataframe
from pyspark.sql.types import StringType
ds = [{'phoneNumber': '6-451-512-3627'},
{'phoneNumber': '0-512-582-3548'},
{'phoneNumber': '1-043-733-0050'}]
sf = spark.createDataFrame(ds)
def addCountry_code(phoneNo):
countryCode= '+00'+phoneNo
if phoneNo[:1] !='0':
return str(countryCode)
else:
return str(phoneNo)
spark.udf.register('phone_replace_udf', lambda x: addCountry_code(x), StringType())
sf.createOrReplaceTempView('sf')
spark.sql('select phone_replace_udf(phoneNumber) from sf').collect()