Search code examples
fileclojure

How to read data from a file into a hash-map (or other data structure) in Clojure?


Not quite sure where to start with this. I have a big file of data which contains different values which all related to a certain thing (i.e data in column 1 would be the hour) the file is 15 columns wide. The file does not contain any column headings though, it is all just numeric data.

I need to read this data into a data type such as a hash map which would allow me to sort through it and query the data using things such as contains? as well as perform calculations.

I am unsure of how to do this as I am new to Clojure, any help would be appreciated.

My file is a txt file (saved as mydata.txt) and structured like so:

  1 23 25 -9  -0 1 1
  2 23 25 10 1 2 3

My code so far is:

(def filetoanalyse (slurp "mydata.txt"))
(zipmap [:num1 :num2 :num3 :num4 :num5 :num6 :num7] filetoanalyse)

It seems to associated the whole of the file with :num1 at current.


Solution

  • Here's a function you can use to do what you're looking for:

    (defn map-from-file [field-re column-names filename]
      (let [ lines  (re-seq #"[^\r\n]+" (slurp filename)) ]
        (map #(zipmap column-names (re-seq field-re %)) lines)))
    

    You have to supply three arguments:

    1. A regular expression to separate the fields in each row. For the data you've shown this can be #"[^ ]+", or basically anything which isn't a blank is part of the field. If you've got simple comma-separated values with no complications such as embedded commas in the data or quoted field something like #"[^,]+" will work. Or if you want to only extract numeric characters something a bit more complex such as `#"[-0-9]+" will work.

    2. A collection of column names to assign.

    3. The name of the file.

    So if the data you show in your question is stored as test3.dat somewhere you could invoke the above function as

    (map-from-file #"[^ ]+" [:c1 :c2 :c3 :c4 :c5 :c6 :c7] "/some-path/test3.dat")
    

    and it would return

    ({:c1 "1", :c2 "23", :c3 "25", :c4 "-9", :c5 "-0", :c6 "1", :c7 "1"} {:c1 "2", :c2 "23", :c3 "25", :c4 "10", :c5 "1", :c6 "2", :c7 "3"})
    

    or in other words you get back a sequence of maps which map the values by the column names you've supplied. If you prefer to have the data in a vector you can use

    (into [] (map-from-file #"[^ ]+" [:c1 :c2 :c3 :c4 :c5 :c6 :c7] "/some-path/test3.dat"))