Search code examples
pythonclasspysparkfakertzinfo

Type error when trying to set tzinfo argument for a datetime


I'm trying to use the faker package to generate fake dates of birth in pyspark.

My code is as below:

from faker import *
from pyspark.sql.types import *
from pyspark.sql import Row
from datetime import *

fake = Faker("en_GB")
fake.seed_locale("en_GB", 0)

df = spark.createDataFrame([
    Row(BIRTH_DT = datetime(2000, 1, 1, 12, 0)),
    Row(BIRTH_DT = datetime(2000, 2, 1, 12, 0)),
    Row(BIRTH_DT = datetime(2000, 3, 1, 12, 0))
])

class anonymise:
    def BIRTH_DT():
        def BirthDt_values():
            return fake.date_of_birth(datetime.tzinfo == None)
        BirthDt_udf = udf(BirthDt_values, TimestampType())
        return BirthDt_udf()

df = df \
.withColumn("BIRTH_DT", anonymise.BIRTH_DT())

df.display()

However I'm getting this error:

PythonException: 'TypeError: tzinfo argument must be None or of a tzinfo subclass, not type 'bool''

I don't understand how it thinks that my parameter value is a boolean? I must be formatting this incorrectly but I can't figure out what should be done. Any help would be appreciated!

Thanks,

Carolina


Solution

  • Solved! The datatype should be DateType() not TimestampType() because it's a date of birth and not a timestamp