Search code examples
clojure

Clojure open a large txt file edit the data and write it to a new file


I'm trying to open a file that is to large to slurp. I want to then edit the file to remove all characters except numbers. Then write the data to a new file.

So far I have
(:require [clojure.java.io :as io])

(:require [clojure.string :as str])

:jvm-opts ["-Xmx2G"]

(with-open [rdr (io/reader "/Myfile.txt")
            wrt (io/writer "/Myfile2.txt")]
  (doseq [line (line-seq rdr)]
    (.write wrt (str line "\n"))))    

Which reads and writes but I'm unsure of the best way to go about editing.Any help is much appreciated. I'm very new to the language.


Solution

  • Looks like you just need to modify the line value before writing it. If you want to modify a string to remove all non-numeric characters, a regular expression is a pretty easy route. You could make a function to do this:

    (defn numbers-only [s]
      (clojure.string/replace s #"[^\d]" ""))
    (numbers-only "this is 4 words")
    => "4"
    

    Then use that function in your example:

    (str (numbers-only line) "\n")
    

    Alternatively, you could map numbers-only over the output of line-seq, and because both map and line-seq are lazy you'll get the same lazy/on-demand behavior:

    (map numbers-only (line-seq rdr))
    

    And then your doseq would stay the same. I would probably opt for this approach as it keeps your "stream" processing together, and your imperative/side-effect loop is only concerned with writing its inputs.