Search code examples
amazon-web-servicesapache-sparkhadoopaws-glueaws-glue-data-catalog

How to remove error records from a Dynamic dataframe in AWS glue?


I have a dynamic dataframe which contains error records.Please find the code below.

val rawDataFrame = glueContext.getCatalogSource(database = rawDBName, tableName = rawTBLName).getDynamicFrame();
    println(s"RAW_DF-----count: ${rawDataFrame.count} errors: ${rawDataFrame.errorsCount}")

The above print statement prints as below.

RAW_DF-----count: 168456 errors: 4

I need to create a dynamic data frame which contains only 168456 records and I need to eliminate 4 error records.Kindly help.


Solution

  • Error records are not converting to Spark's DataFrame so try to transform your DynamicFrame to df and back:

    val noErrorsDyf = DynamicFrame(rawDataFrame.toDF(), glueContext)