Search code examples
rapidminer

Rapid Miner - Issue with Data set meta data information list in Read CSV operator


I am using Rapidminer version 6 for data analysis. I am trying to read a csv file with 6000 rows. when i configure the meta data information in the read csv operator, the data is extracted to show only the last entry (column) in the meta data information list. the process xml code is as below

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
  <context>
<input/>
<output/>
<macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.1.000" expanded="true"     name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="6.1.000" expanded="true"  height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\jeganathan.velu\Desktop\Book1.csv"/>
    <parameter key="column_separators" value=","/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information">
      <parameter key="1" value="interest_rate_bps.true.integer.regular"/>
      <parameter key="1" value="Deposit.true.integer.regular"/>
      <parameter key="2" value="Location.true.nominal.regular"/>
    </list>
  </operator>
  <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
  <portSpacing port="source_input 1" spacing="0"/>
  <portSpacing port="sink_result 1" spacing="0"/>
  <portSpacing port="sink_result 2" spacing="0"/>
</process>

but the tool outputs only the last column Location instead of all three columns configured in meta information list

If i configure meta data for 10 columns then only the tenth column data is read from the csv

requesting your help to find out if i am doing something wrong or is this a bug? A

Thanks in Advance, Jeganathan Velu.


Solution

  • I see the problem in your process.
    If you change the attribute type from 'regular' to 'attribute' then you'll find it works. I believe 'regular' was the way that normal attributes used to be referred to, but this has since changed (at least in the ReadCSV operator) to 'attribute'.