Search code examples
javaclojurediffclojure-contrib

Check physical existence of files referenced in DB table


We have one rather large table containing documents info together with filepaths pointing to files on file system. After couple of years we noticed that we have files on the disk which are not referenced in DB table and vice-versa.

Since currently I'm learning Clojure I tought it would be nice to make small utility which can find diff between db and file system. Naturally, since i'm beginner I got stucked because there's more than 600 000 documents and obviously I need some more performant and less memory consuming solution :)

My first idea was to generate flatten filesystem tree list with all files, and compare it with list from db, if file doesn't exist put in separate list "non-existing" and if some file exists on HDD and not in DB, move it to some dump directory.

Any ideas?


Solution

  • As a sketch, here's how you could check the filesystem against the database, in chunks of whatever size you're happy with:

    (->> (file-seq (java.io.File. "/"))
         (remove (memfn isDirectory))
         (partition 20)
         (map (fn [files] (printf "Checking %d files against db...\n" (count files))))
         (take 2))
    
    (Checking 20 files against db...
    Checking 20 files against db...
    nil nil)
    

    Instead of using printf, do some kind of database checks against the list of files.