Search code examples
clojurelazy-sequences

What will the behaviour of line-seq be?


I'd like to understand the behaviour of a lazy sequence if I iterate over with doseq but hold onto part of the first element.

 (with-open [log-file-reader (clojure.java.io/reader (clojure.java.io/file input-file-path))]

    ; Parse line parse-line returns some kind of representation of the line.
    (let [parsed-lines (map parse-line (line-seq log-file-reader))
          first-item (first parsed-lines)]

          ; Iterate over the parsed lines
          (doseq [line parsed-lines]
            ; Do something with a side-effect  
          )))

I don't want to retain any of the list, I just want to perform a side-effect with each element. I believe that without the first-item there would be no problem.

I'm having memory issues in my program and I think that perhaps retaining a reference to something at the start of the parsed-line sequence means that the whole sequence is stored.

What's the defined behaviour here? If the sequence is being stored, is there a generic way to take a copy of an object and enable the realised portion of the sequence to be garbage collected?


Solution

  • The sequence-holding occurs here

    ...
    (let [parsed-lines (map parse-line (line-seq log-file-reader))
    ...
    

    The sequence of lines in the file are being lazily produce and parsed, but the entire sequence is held onto, within the scope of let. This sequence is realized in the doseq, but doseq is not the problem, it does not do sequence-holding.

    ...
    (doseq [line parsed-lines]
     ; Do something
    ...
    

    You wouldn't necessarily care about sequence-holding in a let because the scope of let is limited, but here presumably your file is large and/or you stay within the dynamic scope of let for a while, or perhaps return a closure containing it in the "do something" section.

    Note that holding onto any given element of the sequence, including the first, does not hold the sequence. The term head-holding is a bit of a misnomer if you consider head to be the first element as in "head of the list" in Prolog. The problem is holding onto a reference to the sequence.