Search code examples
javamysqlhashmapipcidr

HashMap MySQL - Best Practice


I have faced a scenario, the task was to read a file which contains 3 Millions IP Address.
There is MySQL Table which contains Id,PrimaryIP, PrimaryIP can by multiple IP separated by #, more over that PrimaryIP can also contain CIDR IP.

So totally, there are 8000 records, each record with multiple IP and CIDR IP.

Now, my task was to read that file, check it against with database and write the matching IP,ID to a file.

Initially, when i run my program, my program failed because: java.lang.OutOfMemoryError: Java heap space, so i have increased it by 3GB, still it was failing, then later i split the file into 6 subfiles, as 0.5 Millions each.

To find CIDR IP List, i have used Apache SubnetUtils.

Below is my code :

public static void main(String[] args) {

        String sqlQuery = "SELECT id,PrimaryIP from IPTable where PrimaryIP != '' limit 100000;";
        Connection connection = null;
        Statement statement = null;
        File oFile = new File("output.txt");
        System.out.println(new Date());
        try{
            List<String> fileData = FileUtils.readLines(new File("input.txt"));
            System.out.println("File Data Size : "+fileData.size());

            Class.forName("com.mysql.jdbc.Driver");
            connection = DriverManager.getConnection("jdbc:mysql://localhost/db?user=root&password=pwd");

            statement = connection.createStatement();
            ResultSet resultSet = statement.executeQuery(sqlQuery);

            System.out.println("Started with MySQL Querying");

            Map<String, Integer> primaryIPIDMap = new HashMap<String, Integer>();

            while (resultSet.next()) {
                primaryIPIDMap.clear();
                int recordID = resultSet.getInt(1);

                if (resultSet.getString(2).contains("#")) {
                    String primaryIP[] = resultSet.getString(2).split("#");

                    for (int i = 0; i < primaryIP.length; i++) {
                        if (primaryIP[i].contains("/")) {
                            String allIP[] = getAllIP(primaryIP[i]);
                            for (int allIPi = 0; allIPi < allIP.length; allIPi++) {
                                primaryIPIDMap.put(allIP[allIPi].intern(), recordID);
                            }
                        } else {
                            primaryIPIDMap.put(primaryIP[i].intern(), recordID);
                        }
                    }
                } else {
                    primaryIPIDMap.put(resultSet.getString(2).intern(), recordID);
                }

                Iterator entries = fileData.iterator();
                while (entries.hasNext()) {
                    String t = (String) entries.next();
                    if (primaryIPIDMap.containsKey(t)) {
                        FileUtils.writeStringToFile(oFile, recordID + "," + t);
                    }
                }
                primaryIPIDMap.clear();
            }

            resultSet.close();
            statement.close();
            connection.close();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                if (statement != null)
                    statement.close();
            } catch (Exception se2) {
            }
            try {
                if (connection != null)
                    connection.close();
            } catch (Exception se) {
                se.printStackTrace();
            }
        }

        System.out.println("Finished");
        System.out.println("End Time : "+new Date());
    }

    private static String[] getAllIP(String ip) {
        return new SubnetUtils(ip).getInfo().getAllAddresses();
    }  

Can someone tell me the best practice to solve this.
Today it just 3 Millions, tomorrow it may be 5 Millions. I can't keep on creating subfile.


Solution

  • I fixed the problem using

    • Reading the input file line-by-line
    • I didn't change MySQL Table structure because it has a dependency in many places and table was not designed by me.