Search code examples
javamysqlcsvload-data-infile

MySQL, Most efficient Way to Load Data from a parsed file


My File has the following format:

Table1; Info
rec_x11;rec_x21;rec_x31;rec_x41
rec_x12;rec_x22;rec_x32;rec_x42
...
\n
Table2; Info
rec_x11;rec_x21;rec_x31;rec_x41
rec_x12;rec_x22;rec_x32;rec_x42
...
\n
Table3; Info
rec_x11;rec_x21;rec_x31;rec_x41
rec_x12;rec_x22;rec_x32;rec_x42
...

Each batch of records starting from the next line after TableX header and ending by an empty line delimiter is about 700-800 lines size.

Each such batch of lines (rec_xyz...) need to be imported into the relevant MyISAM table name indicated in the header of the batch (TableX)

I am familiar with the option to pipeline the stream using shell comands into LOAD DATA command.

I am interested in simple java snipet code which will parse this file and execute LOAD DATA for a single batch of records each time (in a for loop and maybe using seek command).

for now i am trying to use IGNORE LINES to jump over processed records, but i am not familiar if there is an option to ignore lines from BELOW?

is there a more efficient way to parse and load this type of file into DB?

EDIT

I have read that JDBC supports input stream to LOAD DATA starting from 5.1.3, can i use it to iterate over the file with an input stream and change the LOAD DATA statement each time?


Solution

  • I am attaching my code as a solution,

    This solution is based on the additional functionality (setLocalInfileInputStream) added by MySQL Connector/J 5.1.3 and later.

    I am pipe-lining input-stream into LOAD DATA INTO statement, instead of using direct file URL.

    Additional info: I am using BoneCP as a connection pool

    public final void readFile(final String path)
            throws IOException, SQLException, InterruptedException {
        File file = new File(path);
    
        final Connection connection = getSqlDataSource().getConnection();
        Statement statement = SqlDataSource.getInternalStatement(connection.createStatement());
    
        try{
            Scanner fileScanner = new Scanner(file);
            fileScanner.useDelimiter(Pattern.compile("^$", Pattern.MULTILINE));
    
            while(fileScanner.hasNext()){
                String line;
                while ((line = fileScanner.nextLine()).isEmpty());
    
                InputStream is = new ByteArrayInputStream(fileScanner.next().getBytes("UTF-8"));
                String [] tableName = line.split(getSeparator());
                setTable((tableName[0]+"_"+tableName[1]).replace('-', '_'));
    
                String sql = "LOAD DATA LOCAL INFILE '" + SingleCsvImportBean.getOsDependantFileName(file) + "' " 
                        + "INTO TABLE " + SqlUtils.escape(getTable()) 
                        + "FIELDS TERMINATED BY '" + getSeparator() 
                        + "' ESCAPED BY '' LINES TERMINATED BY '" + getLinefeed() + "' ";
                sql += "(" + implodeStringArray(getFields(), ", ") + ")";       
                sql += getSetClause();
    
                ((com.mysql.jdbc.Statement) statement).setLocalInfileInputStream(is);
                statement.execute(sql);         
            }
        }finally{
            statement.close();
            connection.close();
        }   
    }