Search code examples
rubyregexfilejrubyglob

How can I find just the PDF files under folders "Pricelist" and "Price List"?


I have a task to find out all the PDF files under several price list folders using JRuby on Windows 7. The folder structure is as follows:

WorkSpace/Data/2015/city1/A/...
WorkSpace/Data/2015/city1/B/...
WorkSpace/Data/2015/city1/Pricelist/...
WorkSpace/Data/2015/city1/...
WorkSpace/Data/2015/city1/Price List/.....
WorkSpace/Data/2015/city2/A/...
WorkSpace/Data/2015/city2/C/...
WorkSpace/Data/2015/city2/Pricelist/...
WorkSpace/Data/2015/city2/D/...
WorkSpace/Data/2015/city2/Price List/.....

WorkSpace/Data/2016/city1/folder1/...
WorkSpace/Data/2016/city1/folder2/...
WorkSpace/Data/2016/city1/Pricelist/...
WorkSpace/Data/2016/city1/folder3/...
WorkSpace/Data/2016/city1/folder4/Price List/...
WorkSpace/Data/2016/city2/folder1/...
WorkSpace/Data/2016/city2/folder2/...
WorkSpace/Data/2016/city2/Pricelist/...
WorkSpace/Data/2016/city2/folder3/...
WorkSpace/Data/2016/city2/folder4/Price List/...

... represents all kinds of files under their corresponding folder.

I only want to find the PDF files under folder Pricelist and Price List. How can I do this?

I read Searching a folder and all of its subfolders for files of a certain type. This is an answer which I think is helpful, but how can I modify the expression /.*\.pdf$/ to achieve my goal?


Solution

  • Use a Recursive Glob

    All you need to find your files is Dir#glob and Enumerable#grep. For example:

    Dir.glob('WorkSpace/Data/**/*.pdf').grep /Price List|Pricelist/
    

    This will collect all the PDF files using a recursive glob pattern that descends into all subdirectories starting at Workspace/Data (adjust the path to this starting directory as needed), and then returns only the results that match the directories you're grepping for. In this case, we're using a regular expression pattern with alternation to find either of the two directories you're looking for, without regard to how deeply nested the desired directories might be.

    There may be more efficient ways to do this, or you may need to tweak the regex if it's too permissive for you, but this certainly solves the problem without needing to know much more than the root of the directory tree you want to search.