I'm new to kettle and I was experimenting with the tools it offers. I tried to make a csv file as input using the "Csv file input tool" and a csv output file using the "Text file output" tool and I filtered some fields I didn't need on the first file. I wanted to make a step further so I am going to make you an example of my csv:
Id|Col1 |Col2
1 | test1 | 1
2 | test1 | 1
3 | test2 | 1
3 | test2 | 2
I wanted to filter my csv in a way that for output I'd get the values of the col1 for where I'd get multiple values in col2. In the case of my example it would be only "test2". I can't get to this solution probably because I'm not familiar with this tool.. can you guys give me a hint and guide me to the solution? What are the paths I can take?
I think I found a solution that fits my problem. I added a "Group By tool" where inside I use the Col1 for the groupBy and a "count distinct" as a function on Col2. Then with a "Filter rows count" tool I get the rows with the distinct > 1 :) !