Search code examples
haskellrecursionname-matching

How to properly match file names in subdirectories?


I am currently going through the book Real World Haskell and one exercise from this book asks the reader to implement file name matching with the use of **, which is the same as *, but also looks in subdirectories all the way down in the file system. Below is a fragment of my code with comments (there is a lot of duplication at the moment) and further down you can find additional info about the code. I think that the posted code is sufficient for the problem and there is no need to list the whole program here.

case splitFileName pat of
        ("", baseName) -> do -- just the file name passed
            curDir <- getCurrentDirectory
            if searchSubDirs baseName -- check if file name has `**` in it
              then do 
                  contents <- getDirectoryContents curDir
                  subDirs <- filterM doesDirectoryExist contents
                  let properSubDirs = filter (`notElem` [".", ".."]) subDirs
                  subDirsNames <- forM properSubDirs $ \dir -> do
                                      namesMatching (curDir </> dir </> baseName) -- call the function recursively on subdirectories
                  curDirNames <- listMatches curDir baseName -- list matches in the current directory
                  return (curDirNames ++ (concat subDirsNames)) -- concatenate results into a single list
              else listMatches curDir baseName
        (dirName, baseName) -> do // full path passed
            if searchSubDirs baseName
              then do
                  contents <- getDirectoryContents dirName
                  subDirs <- filterM doesDirectoryExist contents
                  let properSubDirs = filter (`notElem` [".", ".."]) subDirs
                  subDirsNames <- forM properSubDirs $ \dir -> do
                                      namesMatching (dirName </> dir </> baseName) -- call the function recursively on subdirectories
                  curDirNames <- listMatches dirName baseName -- list matches in the passed directory
                  return (curDirNames ++ (concat subDirsNames)) -- concatenate results into a single list

Additional information:

pat is the pattern I'm looking for (e.g. *.txt or C:\\A\[a-z].*).

splitFileName is a function which splits a file path into the directory path and the file name. The first element of the tuple will be empty if we specify just a file name in pat.

searchSubDirs returns True if the file name has ** in it.

listMatches returns a list of file names that match the pattern in the directory, substituting ** for *.

namesMatching is the name of the function whose excerpt I posted.

Why doesn't it work?

When I pass just the file name, the program searches for it only in the current directory and first level of subdirectories. When I pass a full path, it searches only in the specified directory. It looks like case (dirName, baseName) doesn't properly recurse. I've been looking at the code for some time now and I can't figure out where the problem is.

Note

If any more information is needed, please let me know in the comments and I'll add whatever is necessary to the question.


Solution

  • Here's an issue:

                  contents <- getDirectoryContents dirName
                  subDirs <- filterM doesDirectoryExist contents
    

    getDirectoryContents only returns the leaf names of the directories, so you have to prepend dirName (along with a /) to the elements of contents before calling doesDirectoryExist.