In a glue script (running in a zeppelin notebook forwarding to a dev endpoint in glue), I've created a dynamicframe from a glue table, that I would like to filter on field "name" not being in a static list of values, i.e. ("a","b","c").
Filtering on non-equality works just fine like this:
def unknownNameFilter(rec: DynamicRecord): Boolean = {
rec.getField("name").exists(_ != "a")
}
I have tried several things like
!rec.getField("name").exists(_ isin ("a","b","c"))
but it gives errors (value isin is not a member of Any), and I can only find pyspark examples and examples that first convert the dynamicframe to a dataframe on the web (which I want to prevent if possible).
Help much appreciated, thanks.
Okay, found my answer, I'll post it for anyone else looking for this, it is done with
!(knownevents.contains(eventname))
Like this in a filter function:
def unknownEventFilter(rec: DynamicRecord): Boolean = {
val knownevents = List("evt_a","evt_b")
rec.getField("name") match {
case Some(eventname: String) => !(knownevents.contains(eventname))
case _ => throw new IllegalArgumentException(s"Unable to extract field name")
}
}
val dfUnknownEvents = df.filter(unknownEventFilter)