Search code examples
regexunixfindbsd

A regex that works in `find`


I have a directory with ~8000 files of the form

output/Manuscript_00750_AnimalGiants-compact.json
output/Manuscript_00750_AnimalGiants-expanded.json
output/Manuscript_00750_AnimalGiants.json
output/Manuscript_00752_AnimalGiants-compact.json
output/Manuscript_00752_AnimalGiants-expanded.json
output/Manuscript_00752_AnimalGiants.json
output/Unit_TZH_12345_Foo-compact.json
output/Unit_TZH_12345_Foo-expanded.json
output/Unit_TZH_12345_Foo.json

I need to come up with a regex to work with the find tool to select just the Manuscript-compact ones:

output/Manuscript_00750_AnimalGiants-compact.json
output/Manuscript_00752_AnimalGiants-compact.json

Coming up with the regex is the easy part, but getting find to cooperate is the hard part.

Here's my regex:

/Manuscript[0-9_a-zA-Z]+-compact\.json/

Here are some of the commands I've tried; all produce zero results. The cwd is the directory above output/:

find output -regex "Manuscript[0-9_a-zA-Z]+-compact\.json"
find output -regex "\./output/Manuscript[0-9_a-zA-Z]+-compact\.json/"
find output -regex ".*\Manuscript[0-9_a-zA-Z]+-compact.*\json"

But this command does produce results - it selects all the files that start with "Manuscript", which is obviously too broad:

find output -regex ".*\Manuscript.*\json"

What's the correct regex format for find here?


Solution

  • On OSX you can use this find with extended regex:

    find -E output -regex '.*/Manuscript[0-9_a-zA-Z]+-compact\.json$'
    

    On gnu find use:

    find output -regextype posix-extended -regex '.*/Manuscript[0-9_a-zA-Z]+-compact\.json$'