Search code examples
javaunivocity

Univocity - how to parse 3(n) lines as one row(bean)


I am evaluating Univocity parser for one of my project, Fixed width flat file format makes one record(Bean) from three detail recs (e.g., starts with AA, BB, CC) - would this file parseable using Univocity?
I can use recordEndsOnNewline to continue reading and add some custom conversions, but is there any out of box ParserSettings?

AA1234 data
BBmore data
CCsome more data row 1 ended
AA5678 data
BBmore data
CCsome more data row 2 ended

update:

may be use: setLineSeparator("\nAA");?


Solution

  • Author of the library here. First you need to define the field positions. As you want to parse values that occur in multiple lines you must set recordEndsOnNewLine to false, so you are in the right track.

    It's easier to "see" where each record starts and ends if you join the lines that form a single record:

    String input = "" +
        "AA1234 data\nBBmore data\nCCsome more data row 1 ended\n" +
        "AA5678 data\nBBmore data\nCCsome more data row 2 ended";
    

    Given the example you provided, the following field configuration can be created (I assumed you don't want the "AA", "BB" and "CC" strings):

    FixedWidthFields fields = new FixedWidthFields();
    fields
            .addField("a1", 2, 6)
            .addField("a2", 7, 11)
            .addField("b1", 14, 23)
            .addField("c1", 26, 40)
            .addField("c2", 41, 52);
    

    And you can parse your input with this:

    FixedWidthParserSettings settings = new FixedWidthParserSettings(fields);
    settings.getFormat().setLineSeparator("\n");
    settings.setRecordEndsOnNewline(false);
    
    FixedWidthParser parser = new FixedWidthParser(settings);
    
    List<String[]> rows = parser.parseAll(new StringReader(input));
    for (String[] row : rows) {
        System.out.println(Arrays.toString(row));
    }
    

    This will give you the correct output:

    [1234, data, more data, some more data, row 1 ended]
    [5678, data, more data, some more data, row 2 ended]
    

    Now that we know where each field starts and ends, we can define your java bean:

    public static class Bean {
        @FixedWidth(from = 2, to = 6)
        @Parsed
        int a1;
    
        @FixedWidth(from = 7, to = 11)
        @Parsed
        String a2;
    
        @FixedWidth(from = 14, to = 23)
        @Parsed
        String b1;
    
        @FixedWidth(from = 26, to = 40)
        @Parsed
        String c1;
    
        @FixedWidth(from = 41, to = 52)
        @Parsed
        String c2;
    
        @Override
        public String toString() {
            return "Bean{" +
                    "a1=" + a1 +
                    ", a2='" + a2 + '\'' +
                    ", b1='" + b1 + '\'' +
                    ", c1='" + c1 + '\'' +
                    ", c2='" + c2 + '\'' +
                    '}';
        }
    }
    

    With that ready, parsing the input becomes simple as that:

    FixedWidthParserSettings settings = new FixedWidthParserSettings();
    settings.getFormat().setLineSeparator("\n");
    settings.setRecordEndsOnNewline(false);
    settings.setHeaderExtractionEnabled(false); // This one is important as your input has no headers.
    
    FixedWidthRoutines routines = new FixedWidthRoutines(settings);
    for(Bean bean : routines.parseAll(Bean.class, new StringReader(input))){
        System.out.println(bean);
    }
    

    Which will print the beans to the output like that:

    Bean{a1=1234, a2='data', b1='more data', c1='some more data', c2='row 1 ended'}
    Bean{a1=5678, a2='data', b1='more data', c1='some more data', c2='row 2 ended'}
    

    Hope this helps