Search code examples
javapythonpostgresqljdbcclojure

Insert file records into postgres db using clojure jdbc is taking long time compared to python psycopg2


I am trying to insert records into postgres DB, and its taking about 3 hours while it takes 40seconds using python psycopg2 and cursor.copy_from method

What is wrong with my code, using clojure.java.jdbc/db-do-prepared also takes about 3 hours too. Please help!

File size is 175M and it has 409,854 records

(defn-
  str<->int [str]
  (let [n (read-string str)]
    (if (integer? n) n)))

(with-open [file (reader "/path/to/foo.txt")]
    (try
      (doseq [v (clojure-csv.core/parse-csv file)]

        (clojure.java.jdbc/insert! db  :records 
                      nil
                      [(v 0) (v 1) (v 2) (str<->int (v 3))]))
      (println "Records inserted successfully")
      (Exception e
        (println (.getNextException e) e))))

Solution

  • After 4 years, decided to come back to this problem and share a guide to the solution, I am sure this will help someone get started.

    You can take a look at clojure.java.jdbc/insert-multi! and edit appropriately to suite the column types in your database

    (let [from "/path/to/foo.txt"
          to "/path/to/temp/foo.txt"]
      (with-open [reader (io/reader from)
                  writer (io/writer to)]
        (doall
          (->> (csv/read-csv reader)
               ;(drop 1)   ;if theres header
               (map #(list (nth % 0 nil) (nth % 2 nil)  (nth % 3 nil)))
               (csv/write-csv writer))))
      (let [fstream (slurp to)
            streamarray (map #(str/split % #",")
                             (str/split-lines fstream))]
        (clojure.java.jdbc/insert-multi! pg-db              ;connection or {:datasource hk-cp}
                                         :tbl_cdrs_da                 ;table name
                                         [:origin_node_type :origin_transaction_id :da_ua_id] ;colums
                                         streamarray)))               ;array