Search code examples
fileclojureioiteratoriteration

Read line-by-line for big files


I'm trying to write reader for big files, based on iterations in Clojure. But how I can return line by line strings in Clojure? I want to make something like that:

(println (do_something(readFile (:file opts))) ; process and print first line
(println (do_something(readFile (:file opts))) ; process and print second line

Code:

(ns testapp.core
  (:gen-class)
  (:require [clojure.tools.cli :refer [cli]])
  (:require [clojure.java.io]))


(defn readFile [file, cnt]
  ; Iterate over opened file (read line by line)
  (with-open [rdr (clojure.java.io/reader file)]
    (let [seq (line-seq rdr)]
      ; how return only one line there? and after, when needed, take next line?
    )))

(defn -main [& args]
  ; Main function for project 
  (let [[opts args banner] 
        (cli args
          ["-h" "--help" "Print this help" :default false :flag true]
          ["-f" "--file" "REQUIRED: File with data"]
          ["-c" "--clusters" "Count of clusters" :default 3]
          ["-g" "--hamming" "Use Hamming algorithm"]
          ["-e" "--evklid" "Use Evklid algorithm"]
          )]
    ; Print help, when no typed args
    (when (:help opts)
      (println banner)
      (System/exit 0))
    ; Or process args and start work
    (if (and (:file opts) (or (:hamming opts) (:evklid opts)))
      (do
        ; Use Hamming algorithm
        (if (:hamming opts)
          (do
            (println (readFile (:file opts))
            (println (readFile (:file opts))
          )
          ;(count (readFile (:file opts)))
        ; Use Evklid algorithm
        (println "Evklid")))
      (println "Please, type path for file and algorithm!")))) 

Solution

  • May be i'm not understanding right what do you mean by "return line by line", but i'll suggest you to write function, which accepts file and processing function, then prints result of processing fuction for every line of your big file. Or, evem more general way, let's accept processing function and output function (println by default), so if we want not just print, but send it over network, save someplace, send to another thread, etc:

    (defn process-file-by-lines
      "Process file reading it line-by-line"
      ([file]
       (process-file-by-lines file identity))
      ([file process-fn]
       (process-file-by-lines file process-fn println))
      ([file process-fn output-fn]
       (with-open [rdr (clojure.java.io/reader file)]
         (doseq [line (line-seq rdr)]
           (output-fn
             (process-fn line))))))
    

    So

    (process-file-by-lines "/tmp/tmp.txt") ;; Will just print file line by ine
    (process-file-by-lines "/tmp/tmp.txt"
                           reverse) ;; Will print each line reversed