Search code examples
pysparketl

How to replace null values with previous non null values


I have a column that is something like this

A B
Value1 Some Value
null Some Value
null Some Value
Value2 Some Value
Value3 Some Value
null Some Value
null Some Value

and i want

A B
Value1 Some Value
Value1 Some Value
Value1 Some Value
Value2 Some Value
Value3 Some Value
Value3 Some Value
Value3 Some Value

i have used tjava_row but have encountered certain error. i have used the code given below, or should i go for python(ffill) or pyspark for designing the elt job:

if (row2.A==null) {
 row2.A = row1.A
} else {
 row2.A=row2.A; 

Solution

  • With Talend, you can use a tMap with intern variables to use previous values. You'll need three variables (from the central tab in your tMap) :

    • current : row1.yourValue // cache your input data for current line

    • previous : Relational.ISNULL(Var.current)?Var.previous:Var.current //if current line is null, keep the previous one. Otherwise replace 'previous' with current line.

    • getPreviousIfNull : Relational.ISNULL(row1.yourValue)?Var.previous:Var.current

      //if your input data is null : get the previous one. Otherwise get the current one.

    row1.yourValue is your input data.

    Then you extract Var.getPreviousIfNull to your output.

    enter image description here