Search code examples
javaregextokenize

Check schema of a record in java


I have a text file. Each line in the file represents a record having 'n' number of columns delimited by a | (pipe) character. The column-values are of type int, string, date, timestamp, etc. Empty string and spaces are also possible as column values.

I am validating only the count of the column-values and validation of data type is not required.

Sample valid records of 5 columns each:

1234|xyz|abc|2016-04-08 11:12:40|234
1235|efgh|abc|2016-04-09 11:25:40|
1236|efghij| ||

Validation code:

boolean valid = true;
String line = buffReader.readLine();
String[] tokens = null;
while (line != null){
    tokens = line.split("\\|");
    if ((tokens.length==4 || tokens.length==5) && countPipes(line)==4){

    } else {
        valid = false;
        break;
    }
    line = buffReader.readLine();
}

private int countPipes(String line){
    int count = 0;
    count = line.length() - line.replace("|", "").length();
    return count;
}

I feel that the code can be better. Can someone let know how i can improve this code?


Solution

  • Well, you can simply check that there are four pipes in the line. If there are exactly four pipes, then there are five columns, which may be empty (which you allow).

    while (line != null) {
        if ( countPipes(line) != 4 ) {
            valid = false;
            break;
        }
        line = buffReader.readLine();
    }
    

    Now you don't need to split the line at all.

    A note about splitting, though. If you use the split with two parameters, and use a negative number, the split will contain entries for the empty elements as well. Here is a demonstration:

    public class Test {
    
        public static void main(String[] args) throws IOException {
            String line = "A|B|||";
    
            String[] zeroSplit = line.split("\\|");
            String[] negativeSplit = line.split("\\|",-1);
    
            System.out.println( "When split without parameter: " + zeroSplit.length );
            System.out.println( "When split with negative parameter: " + negativeSplit.length );
        }
    }
    

    The output here is:

    When split without parameter: 2
    When split with negative parameter: 5

    So in this case, you can check that your split is exactly of length 5, and get the same result.

    while (line != null) {
        if ( line.split("\\|",-1).length != 5 ) {
            valid = false;
            break;
        }
        line = buffReader.readLine();
    }