Search code examples
clojure

How Do I Ignore Blank Lines Processing a CSV File In Clojure?


How do I ignore blank lines when mapping zipmap over a file?

(defn csv-data->maps [csv-data]
  (map zipmap
       (->> (first csv-data) ;; First row is the header
            repeat)
       (rest csv-data)))

Solution

  • Easiest way is to re-use an existing library:

    (ns tst.demo.core
      (:use tupelo.core tupelo.test)
      (:require
        [clojure.string :as str]
        [schema.core :as s]
        [tupelo.csv :as csv]))
    
    (s/defn remove-blank-lines :- s/Str
      "Accepts a multi-line text string, and returns one with any blank lines removed."
      [text-str :- s/Str]
      (let [text-lines      (str/split-lines text-str)
            lines-no-blanks (remove str/blank? text-lines)
            text-no-blanks  (str/join \newline lines-no-blanks)]
        text-no-blanks))
    
    (dotest
      (let [csv-text           "zip-postal-code,store-num,chain-rank
                                01002,00006,4
                                01002,00277,5
    
                                01003,00277,5
                                01008,01217,5
                                01009,00439,5
                                01020,01193,5"
            csv-text-no-blanks (remove-blank-lines csv-text)
            csv-entities       (csv/parse->entities csv-text-no-blanks)
            csv-attrs          (csv/entities->attrs csv-entities)]
        (is= csv-entities
          [{:zip-postal-code "01002", :store-num "00006", :chain-rank "4"}
           {:zip-postal-code "01002", :store-num "00277", :chain-rank "5"}
           {:zip-postal-code "01003", :store-num "00277", :chain-rank "5"}
           {:zip-postal-code "01008", :store-num "01217", :chain-rank "5"}
           {:zip-postal-code "01009", :store-num "00439", :chain-rank "5"}
           {:zip-postal-code "01020", :store-num "01193", :chain-rank "5"}])
    

    As the example shows, you can get the CSV data either row-oriented (entity maps) or column oriented (attribute vectors).

        (is= csv-attrs
          {:store-num       ["00006" "00277" "00277" "01217" "00439" "01193"],
           :zip-postal-code ["01002" "01002" "01003" "01008" "01009" "01020"],
           :chain-rank      ["4" "5" "5" "5" "5" "5"]})
        ))
    

    See the docs here for the tupelo.csv lib.


    Another way (perhaps easier) is to pre-process the file with a simple Unix tool like sed. Consider a sample file:

    ~/expr/demo > cat csv.txt                
    zip-postal-code,store-num,chain-rank
    
    01002,00006,4
    01002,00277,5
    
    01003,00277,5
    01008,01217,5
    01009,00439,5
    01020,01193,5
    

    and process it with sed (Stream EDitor):

    ~/expr/demo > sed  '/^ *$/d'  csv.txt   
    zip-postal-code,store-num,chain-rank
    01002,00006,4
    01002,00277,5
    01003,00277,5
    01008,01217,5
    01009,00439,5
    01020,01193,5
    

    Viola!