I'm working with the pentaho data integration, Spoon.
Short description: I want to get the number of times a value has appeared in the flow for each line that reads from the flow.
Long description: I am doing the transformation of the fact table, and when I read the data of a csv file, I have a client has traveled in a certain airplane at a specific time. I want to add a column, available seats, that whenever it appears read a data that a customer travels in a concrete airplane, look in the previous flow what is the number of seats available of that airplane and subtract 1.
Example.
Initially Flight 1 has 160 seats available and Flight 2 has 320 seats available.
CSV
Flight | Client
1 | 1
2 | 2
1 | 3
2 | 4
I can add a column that the value is the total of available seats.
Flight | Customer | Available seats
1 | 1 | 160
2 | 2 | 320
1 | 3 | 160
2 | 4 | 320
but afterwards i do not know how to obtain the minimum value of the seats available given a certain flight in each reading of the flow.
Final output I want in my flow..
Flight | Customer | Available seats
1 | 1 | 159
2 | 2 | 319
1 | 3 | 158
2 | 4 | 318
Many thanks in advance for the time in read my question.
You can use Add value fields changing sequence
(available under the "Transform" steps group) step to generate a counter for each flight. The step will require the input to be sorted by Flight
field. And you will need to specify Flight
field in this step, so that the counter will be reset, once the new group of flights will start.
Then you will be able to subtract the counter from Available seats
field to get the current value using Calculator/JavaScript/Java formula or any other step.
Here is an example, which you can copy and paste onto Spoon canvas:
<?xml version="1.0" encoding="UTF-8"?>
<transformation-steps>
<steps>
<step>
<name>Add value fields changing sequence</name>
<type>FieldsChangeSequence</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<start>1</start>
<increment>1</increment>
<resultfieldName>counter</resultfieldName>
<fields>
<field>
<name>Flight</name>
</field>
</fields>
<cluster_schema/>
<remotesteps>
<input>
</input>
<output>
</output>
</remotesteps>
<GUI>
<xloc>352</xloc>
<yloc>96</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>Data Grid</name>
<type>DataGrid</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<fields>
<field>
<name>Flight</name>
<type>Integer</type>
<format/>
<currency/>
<decimal/>
<group/>
<length>-1</length>
<precision>-1</precision>
<set_empty_string>N</set_empty_string>
</field>
<field>
<name>Customer</name>
<type>Integer</type>
<format/>
<currency/>
<decimal/>
<group/>
<length>-1</length>
<precision>-1</precision>
<set_empty_string>N</set_empty_string>
</field>
<field>
<name>Total available seats</name>
<type>Integer</type>
<format/>
<currency/>
<decimal/>
<group/>
<length>-1</length>
<precision>-1</precision>
<set_empty_string>N</set_empty_string>
</field>
</fields>
<data>
<line> <item>1</item><item>1</item><item>160</item> </line>
<line> <item>2</item><item>2</item><item>320</item> </line>
<line> <item>1</item><item>3</item><item>160</item> </line>
<line> <item>2</item><item>4</item><item>320</item> </line>
</data>
<cluster_schema/>
<remotesteps>
<input>
</input>
<output>
</output>
</remotesteps>
<GUI>
<xloc>80</xloc>
<yloc>96</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>Sort rows (by flight)</name>
<type>SortRows</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<directory>%%java.io.tmpdir%%</directory>
<prefix>out</prefix>
<sort_size>1000000</sort_size>
<free_memory/>
<compress>N</compress>
<compress_variable/>
<unique_rows>N</unique_rows>
<fields>
<field>
<name>Flight</name>
<ascending>Y</ascending>
<case_sensitive>N</case_sensitive>
<collator_enabled>N</collator_enabled>
<collator_strength>0</collator_strength>
<presorted>N</presorted>
</field>
</fields>
<cluster_schema/>
<remotesteps>
<input>
</input>
<output>
</output>
</remotesteps>
<GUI>
<xloc>192</xloc>
<yloc>96</yloc>
<draw>Y</draw>
</GUI>
</step>
<step>
<name>Calculator</name>
<type>Calculator</type>
<description/>
<distribute>Y</distribute>
<custom_distribution/>
<copies>1</copies>
<partitioning>
<method>none</method>
<schema_name/>
</partitioning>
<calculation>
<field_name>Available seats</field_name>
<calc_type>SUBTRACT</calc_type>
<field_a>Total available seats</field_a>
<field_b>counter</field_b>
<field_c/>
<value_type>Integer</value_type>
<value_length>-1</value_length>
<value_precision>-1</value_precision>
<remove>N</remove>
<conversion_mask/>
<decimal_symbol/>
<grouping_symbol/>
<currency_symbol/>
</calculation>
<cluster_schema/>
<remotesteps>
<input>
</input>
<output>
</output>
</remotesteps>
<GUI>
<xloc>496</xloc>
<yloc>96</yloc>
<draw>Y</draw>
</GUI>
</step>
</steps>
<order>
<hop>
<from>Add value fields changing sequence</from>
<to>Calculator</to>
<enabled>Y</enabled>
</hop>
<hop>
<from>Data Grid</from>
<to>Sort rows (by flight)</to>
<enabled>Y</enabled>
</hop>
<hop>
<from>Sort rows (by flight)</from>
<to>Add value fields changing sequence</to>
<enabled>Y</enabled>
</hop>
</order>
<notepads>
</notepads>
<step_error_handling>
</step_error_handling>
</transformation-steps>