Search code examples
regexperlfindpcre

Alternative to 'find' which supports PCRE


Linux's find command does not support Perl compatible regular expressions (PCRE). Is there an alternative that can do that that is concise to use (one line on command line).

I found some one liners but they were long and complicated, making it difficult to understand what they do and a pain to write them every time.

Examples:

https://unix.stackexchange.com/questions/726878/is-it-possible-to-use-perl-like-regular-expressions-with-the-linux-find-command

uses pipelining, -options, and multiple functions.

Unix/Linux/FreeBSD Find Command with Perl Regex

uses a lot of options and also Perl

I tried using Perl directly but didn't find a pure Perl one-liner for it.

Example:

perl regex command line - how to get matches instead replacing

Gives a one liner for finding matches within a single file. But does not find filename matches within a directory.


Solution

  • Using Perl's File::Find

    perl -MFile::Find -wE'find( sub { say $File::Find::name if /\.pl$/ }, q(.) )'
    

    This finds all entries which end with .pl, recursively anywehere under the current directory.

    Or take the directory as input, with the current dir as default

    perl -MFile::Find -wE' $d = shift//q(.); 
        find( sub { say $File::Find::name if /\.pl$/ }, $d )' directory-name
    

    Or assemble all files found for some possible post-processing, writing to file etc

    perl -MFile::Find -wE' $d = shift//q(.); 
        find( sub { push @f, $File::Find::name if /\.pl$/ }, $d ); 
        say for @f'  directory-name
    

    (If run without a directory-name then the current directory is used)

    However, I don't see why a simple pipeline of find + grep isn't suitable. The grep itself supports basic regex, while with -E it supports extended ones and with -P (Perl) it uses PCRE. So

    find ... | grep -P regex 
    

    does exactly what is asked, in one command line. Criteria for filenames then can be split, some to go with find's own globbing and some in grep's regex.


    Finally, the question asks for PCRE and find indeed doesn't have PCRE regex, as stated. However, find does support other flavors of regex. The man page has only a basic statement while a detailed description of differences between the flavors can be found with info find command on Linux (what I couldn't find on internet).

    In short, the main differences from PCRE as used by grep and other tools, are: 1) the regex pattern has to match the whole path and not just a substring in it, and 2) this is very basic

    So to find a file which has letters and then numbers before a .txt extension in the filename, with anything else for the path, anywhere in or under the current directory

    find . -type f -regex '.*\/[a-zA-Z]+[0-9]+\.txt'
    

    Note that the leading .* is necessary, otherwise the path leading to the filename itself can't be matched (there's at least ./ in it).

    Basic as it is in comparison with the full PCRE, this may well be plenty enough for most uses.