Search code examples
aws-glue

Add a Boolean Column in Target table using AWS Glue


I am new to a AWS Glue and wanted your help in doing a very simple transformation. I am trying to learn AWS Glue

Below is my data. I want to add a new column in the target dataset that if the Movie rating is above 5 show 'Yes' else 'No'. The Movie_Id & User_id combo is unique field in the data set.

my data

id  movie_id    user_id    rating  
1    abc         xyx        10
2    csd         xyx         8
3    abc         sss         3
4    csd         sss         5

Result

id   movie_id     user_id     rating   Yes/No
1    abc         xyx        10        Yes
2    csd         xyx         8        Yes
3    abc         sss         3        No
4    csd         sss         5        No

Solution

  • This can be done using an UDF something similar as shown below. You can read more about it here.

    def deriveBool(rec):
      if rec["rating"] > 5 :
        rec["Yes/No"] = 'Yes'
      else:
        rec["Yes/No"] = 'No'
      return rec    
    datasource_mapped = Map.apply(frame = datasource0, f = deriveBool, transformation_ctx = "deriveboolvalues")