Search code examples
javamysqlcsvarraylist

Java - Having a huge ArrayList (1 million +), how to create a String of it in a acceptable amount of time?


as I mentioned, I have huge ArrayLists in this format:

List<List<String>> alist;

I get the Lists out of some .CSV which I get over a extern databasesystem. (I wrote a Visual Objects script to export the data we need for some calculations)

After I have the .CSV, I load the content into my List like this:

String line;
       alist = new ArrayList<List<String>>();
       int i=0;
    // Datei laden, und anschließend die Zeilen der CSV in eine ArrayList speichern 
       
       try {             
            
            br = new BufferedReader(new FileReader(path));
            while((line = br.readLine()) != null)
            {
                line = line.replace(",", ".");
                if(line.endsWith(delimitter))
                    line = line + " ";
                alist.add(Arrays.asList(line.split(delimitter)));
                i++;
                if(i==10000000)
                    break;
            }       
            
        }

It takes me like ~15900ms to store the Data in my ArrayList (~1,1 million rows and 11 columns). Pretty okay I guess. Now, that I have the List in the format I need, I would like to create a Insert out of it, so we can import the data in our database. I'm creating the insert like this:

public String getInsertString()
    {
        // Tabelle ergibt sich aus dem Dateinamen, ohne das .csv
        String insert="REPLACE INTO "+tablename + " (";
        
        // Spaltennamen aus Array auslesen, immer die erste Zeile des CSV
        for(int i=0; i< alist.get(0).size();i++)
        {
            if(i==0)
                insert = insert + alist.get(0).get(i).trim();
            else
                insert = insert + " ,"+ alist.get(0).get(i).trim();
        }
        insert= insert + ") \rVALUES";
        
        // Values der SPalten in den Insert schreiben + korreckte Syntax des Bfehels sicherstellen usw.
        for(int i=1;i < alist.size();i++) // Size nach "unten" in der 2D Liste
        {
        
            insert= insert +"(";
            for(int j=0; j < alist.get(0).size();j++) // Size nach "rechts" in der 2D Liste
            {
                // bei dem ersten ohne "," starten, damit die Syntax stimmt
                // Sollte der aktuelle Wert eine Zahl oder "null" sein, keine "'" setzen. Ansonsten "'" setzen fuer den Insert in die DB
                if(j==0)
                {           
                    if((StringUtils.isStrictlyNumeric(alist.get(i).get(j).trim())) || alist.get(i).get(j).trim().contains("null"))
                        insert = insert + alist.get(i).get(j).trim();
                    else 
                        insert = insert + "'" + alist.get(i).get(j).trim() + "'";
                }                   
                else
                {
                    if(((StringUtils.isStrictlyNumeric(alist.get(i).get(j).trim()))) || (alist.get(i).get(j).trim().contains("null")))
                        insert = insert +","+ alist.get(i).get(j).trim();
                    else 
                        insert = insert + ",'" + alist.get(i).get(j).trim() + "'";      
                }
                
            }
            if(i < alist.size()-1)
                insert= insert +"),";
            else 
                insert= insert +")";
            insert = insert +"\r";
            
        }
        
        //System.out.println(insert);
        return insert;
        
    }

Here I go over the whole List and add the values to a string so that I can use the string for a insert. I use the filenames as table names and the first row of a file for the columns in the insert. All other rows are the values.

After this step is done, I get a string like this "REPLACE INTO tablename (column1,column2 ... columnx) VALUES(value1, value2 ... valuex), (value1, value2 ... valuex), ...."

Now I would execute my InsertInDb class with that string and yeah, that's it.

But the second step takes way too much time. (I wait for like a hour now) Is there a smarter way to do what I want to do? (Insert all CSV automatically in our Database)

Would a BigList maybe improve the speed? https://dzone.com/articles/biglist-scalable-high (can't test it right now)

Edit: What I did to solve the problem:

I created a class for my SQL statement -->

public class BuildInsert {

    
    private String insertString;

    public String getINSERTSTRING()
    {
        return insertString;
    }
    
    BuildInsert(String tablename, List<String> alist )
    {   
        int size = alist.size();
        
        this.insertString = "REPLACE INTO " + tablename + "(";
        
        // Insert "Header"
        for(int j=1; j <= size ;j++) 
        {
            if(j < size)
                this.insertString = this.insertString + alist.get(j-1)+",";
            else
                this.insertString = this.insertString + alist.get(j-1)+")\n";
            
        }
        
        this.insertString = this.insertString +"VALUES(";
        
        // Insert values
        for(int j=1; j <= size ;j++) 
        {
            if(j < size)
                this.insertString = this.insertString + "?,";
            else
                this.insertString = this.insertString + "?)";
            
        }
    }
    
}

and tock this String for a batch insert / prepared statement as Mike mentioned:

Connection con;
        Statement stmt;

        final int batchSize = 1000;
        int count = 0;
        
        int sizeH = alist.size();
        int sizeL = alist.get(0).size();

        try {
            
             // Connection oeffnen und prepared statment vorbereiten
            System.out.println("Connecting to database...");
            con = DriverManager.getConnection(DB_URL,USER,PASS);
            con.setAutoCommit(false);
            
            ps = con.prepareStatement(insertString);
            
            stmt = con.createStatement();
      
          //< alist.size()
            for(int i=1;i < sizeH ;i++) // Size nach "unten" 
            {
                for(int j=0; j < sizeL;j++) // Size nach "rechts" 
                {
                    ps.setString(j+1, alist.get(i).get(j));
                }
                
                ps.addBatch();
    
                if(++count % batchSize == 0){
                    ps.executeBatch();
                    con.commit();
                }
            }    
            
            ps.executeBatch();
            con.commit();
            
        }

The insert is much faster now. (~230 seconds for 1,1 mil rows)

Thank you guys


Solution

  • Now, that I have the List in the format I need, I would like to create a Insert out of it, so we can import the data in our database.
    

    Don't do that! Do a JDBC batch insert instead. (See this on how to do a PreparedStatement which will save you a lot of trouble on formatting the insert statements)