Search code examples
filetextclojure

Read a very large text file into a list in clojure


What is the best way to read a very large file (like a text file having 100 000 names one on each line) into a list (lazily - loading it as needed) in clojure?

Basically I need to do all sorts of string searches on these items (I do it with grep and reg ex in shell scripts now).

I tried adding '( at the beginning and ) at the end but apparently this method (loading a static?/constant list, has a size limitation for some reason.


Solution

  • You need to use line-seq. An example from clojuredocs:

    ;; Count lines of a file (loses head):
    user=> (with-open [rdr (clojure.java.io/reader "/etc/passwd")]
             (count (line-seq rdr)))
    

    But with a lazy list of strings, you cannot do those operations efficiently which require the whole list to be present, like sorting. If you can implement your operations as filter or map then you can consume the list lazily. Otherwise it'll be better to use an embedded database.

    Also note that you should not hold on to the head of the list, otherwise the whole list will be loaded in memory.

    Furthermore, if you need to do more than one operation, you'll need to read the file again and again. Be warned, laziness can make things difficult sometimes.