Search code examples
javadatabasecsvjdbcclickhouse

How to speed up inserting a CSV file into ClickHouseDB with JavaClient or JDBC


I want to insert a file with 864k rows ~ 100MB into a clickhouse Database. I know about ways to insert a file with ClickHouse Client or Importing staright from ClickHouse cloud - both this methods taking only about 1.5s to insert this file. My whole point is to insert a lot of csv files into database fast as possible.

I've tried to insert the same file with Java through different ways:

  1. ClickHouse JDBC Driver:
 public void insertData(List<String> data) {
    try (PreparedStatement statement =
        connection.prepareStatement(
            "INSERT INTO btc_data SELECT * FROM input('col1 DateTime, col2 Float32, col3 Float32, "
                + "col4 Float32, col5 Float32, col6 Decimal(38,2), "
                + "col7 DateTime, col8 Float32, col9 Int32, col10 Decimal(38,2), "
                + "col11 Float32, col12 Int32')")) {

      batchInsertData(data, statement);

      statement.executeBatch();

    } catch (SQLException e) {
      throw new RuntimeException(e);
    }
  }

  private void batchInsertData(List<String> data, PreparedStatement statement) {
    try {
      for (String str : data) {
        String[] values = str.split(",");

        LocalDateTime open_time =
            LocalDateTime.ofInstant(
                Instant.ofEpochMilli(Long.parseLong(values[0])), ZoneOffset.UTC);
        LocalDateTime close_time =
            LocalDateTime.ofInstant(
                Instant.ofEpochMilli(Long.parseLong(values[6])), ZoneOffset.UTC);

        statement.setObject(1, open_time);
        statement.setFloat(2, Float.parseFloat(values[1]));
        statement.setFloat(3, Float.parseFloat(values[2]));
        statement.setFloat(4, Float.parseFloat(values[3]));
        statement.setFloat(5, Float.parseFloat(values[4]));
        statement.setBigDecimal(6, new BigDecimal(values[5]));
        statement.setObject(7, close_time);
        statement.setFloat(8, Float.parseFloat(values[7]));
        statement.setInt(9, Integer.parseInt(values[8]));
        statement.setBigDecimal(10, new BigDecimal(values[9]));
        statement.setFloat(11, Float.parseFloat(values[10]));
        statement.setInt(12, Integer.parseInt(values[11]));

        statement.addBatch();
      }

    } catch (SQLException e) {
      throw new RuntimeException(e);
    }
  }

This one takes about 15 seconds - 10x times slower to fully insert 100MB file into a database.

  1. JavaClient (I thought would be the fastest)
public void insertFromFile() {
    try (ClickHouseClient client = ClickHouseClient.newInstance(server.getProtocol())) {
      ClickHouseFile file =
          ClickHouseFile.of(
              "src/main/resources/864400.csv", ClickHouseCompression.NONE, ClickHouseFormat.CSV);
      ClickHouseResponse response =
          client
              .write(server)
              .set("format_csv_delimiter", ",")
              .table("btc_data")
              .data(file)
              .executeAndWait();
      ClickHouseResponseSummary summary = response.getSummary();
      System.out.println(summary.getWrittenRows());

      response.close();

    } catch (ClickHouseException e) {
      throw new RuntimeException(e);
    }
  }

This method takes about 28 seconds!!! - 20x times slower than native methods.

My overall question is how to speed up inserting with Java so I could insert as fast as native methods with Clickhouse Client or Cloud, maybe just twice slower, but not 10x times slower.

I've tried RDBC Driver - too slow.


Solution

  • full-disclosure: I work for ClickHouse and am not an expert with the Java clients. However, I discussed this with the Java dev team at ClickHouse. They made a couple of suggestions:

    1. You may want to instrument some logging with timestamps to determine the amount of time being spent by Java bootstrapping vs. actual insertion time. This would provide a better comparison vs clickhouse-client

    2. Adjusting the socket buffers for larger size can improve performance:

      prop.setProperty("buffer_size", "8192");

      prop.setProperty("socket_rcvbuf", "804800");

      prop.setProperty("socket_sndbuf", "804800");

    Hope this helps - please let me know