Search code examples
javascriptcompareetlpentahopentaho-spoon

How compare every element in a row in Pentaho


I have an Excel and there is an example of how it looks enter image description here

I am using Pentaho, with the purpose of creating a new row(related to) in which I will show if a person has a relation with another one, I will consider that two-person are related if they have the same Dirección (address). For instance, María Isabel Hevilla Castro and Miguel Manceras Fernández live in the same place, then in relation to of María Isabel Hevilla Castro it will be Miguel Manceras Fernández and on the contrary, in Miguel Manceras Fernández it will be María IsabelHevilla Castro.

I have tried to solve this using a Javascript modified value, but I'm just beginning to learn Javascript and I don't know how to solve this problem. Could somebody help me, or give me a clue.


Solution

  • If your addresses are clean you can do this with a self-join on Dirección.

    The idea is that you sort by Dirección, then duplicate the stream, rename the name field to something else (Nombre2 or Related_to) and inner join them by Dirección. This will result in records for every combination that has the same Dirección, including the person themselves. That is fixed by filtering the rows, keeping only the ones where Nombre is not equal to Nombre2.

    transformation example

    The basic flow can be extended with cleanup of address fields (Calculator step can do similarity scores) beforehand or extra processing afterwards for the related_to field.