Search code examples
javabean-io

How read CSV file filtering by line with Beanio?


I want to read a CSV file with BeanIO and I want only the lines start with "CA" skipping the rest of the lines. I need the values "0" "1" "2" and "3" "4" "5" of lines "CA"

AA123
BA456
CA789
CA012
CA345
DA678
EA901

BeanIO has a XML mapper.

<stream name="InfoCSV" format="csv">
  <record name="info" class="com.example.Info" minOccurs="0" maxOccurs="unbounded">
    <field name="digit1" />
    <field name="digit2" />
    <field name="digit3" />
  </record>
</stream>

How do I filter the lines? I don't know how do the XML parser


Solution

  • First, from the data you have shown, you must use a fixedlength format parser and not a csv:

    <stream name="InfoCSV" format="fixedlength" />
    

    Appendix A par 7 Streams have a configuration setting called ignoreUnidentifiedRecords that you need to ignore the records/lines that doesn't start with "CA".

    You also need to tell the parser how to identify the record/lines you are interested in. Section 4.2.1 explains how record identification works with rid="true" and the literal attribute. If we assume that the first 2 characters identify the record/line you are interested in we have:

    <field name="id" position="0" length="2" rid="true" literal="CA" />
    

    Putting it all together:

    <stream name="InfoCSV" format="fixedlength" ignoreUnidentifiedRecords="true">
      <record name="info" class="com.example.Info" minOccurs="0" maxOccurs="unbounded">
        <field name="id" position="0" length="2" rid="true" literal="CA"/>
        <field name="digit1" position="2" length="1" />
        <field name="digit2" position="3" length="1" />
        <field name="digit3" position="4" length="1" />
      </record>
    </stream>