I have below dictionary for keeping feature definitions as strings.
features = {
"journey_email_been_sent_flag": "F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0))",
"journey_opened_flag": "F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))"
}
retrieved_features = {}
non_retrieved_features = {}
Or keeping it as definition itself.
features = {
"journey_email_been_sent_flag": F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0)),
"journey_opened_flag": F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))
}
Then below code for retrieving the feature definitions
def feature_extract(*featurenames):
for featurename in featurenames:
if featurename in features:
print(f"{featurename} : {features[featurename]}")
retrieved_features[featurename] = features[featurename]
else:
print('failure')
non_retrieved_features[featurename] = "Not Found in the feature defenition"
return retrieved_features
And this is how I call the function for retrieving the features
feature_extract('journey_email_been_sent_flag','journey_opened_flag')
However its not working when I am trying to retrieve the future , i receive the below result when keeping the definition in dictionary
Out[19]: {'journey_email_been_sent_flag': Column<b'CASE WHEN (email_14days > 0) THEN 1 ELSE 0 END'>}
when i call the retrieval of feature as below in the dataframe.
.withColumn('journey_email_been_sent_flag', feature_extract('journey_email_been_sent_flag'))
getting below error
AssertionError: col should be Column
I could fix it by this way
I keep the feature definition as definitions
features = {
"journey_email_been_sent_flag": F.when(F.col('email_14days') > 0,F.lit(1)).otherwise(F.lit(0)),
"journey_opened_flag": F.when(F.col('opened_14days') > 0, F.lit(1)).otherwise(F.lit(0))
}
And call the feature_extract function using F.lit
F.lit(feature_extract('journey_email_been_sent_flag').get('journey_email_been_sent_flag'))