Search code examples
phpglob

search for files with established criteria php


as the title suggests, I would like to create a file search system with criteria. I would like to filter the file name whether it contains a specific sequence of numbers or letters.

example:

CRITERIA WITH WHICH I WOULD FILTER THE RESEARCH

value 1 = FRGHSD02D5102T value 2 = 005878

[file] 00256_FRGHSD02D5102T0013005878.TXT I WANT TO FIND IT

FRGHSD02D5102T00256_0013005878.TXT I WANT TO FIND IT

_FRGHSD02D5102T001300587800256.TXT I WANT TO FIND IT

00058_GHT52DSF56S03U0014002545.TXT I DO NOT WANT TO FIND IT

I tried to get this using the glob () function;

$ files = glob ("... / ..... / *. txt");

so he finds nothing

$ files = glob ("... / ..... / *. txt / * {002} * {001} *. txt");

thanks a lot


Solution

  • The following syntax can be used:

    ^.*(FRGHSD02D5102T|005878).+$

    This would match any line that contains FRGHSD02D5102T or 005878:

    00256_FRGHSD02D5102T0013005878.TXT ✅
    FRGHSD02D5102T00256_0013005878.TXT ✅
    _FRGHSD02D5102T001300587800256.TXT ✅
    00256_005878.TXT ✅
    FRGHSD02D5102T.TXT ✅
    00058_GHT52DSF56S03U0014002545.TXT ❌
    00256_005873.TXT ❌
    

    This can be combined with glob to search through all folders and subfolders for the specific pattern:

    $folder = __DIR__ . '/data';
    $pattern = '/^.*(FRGHSD02D5102T|005878).+$/';
    
    $dir = new RecursiveDirectoryIterator($folder);
    $ite = new RecursiveIteratorIterator($dir);
    $files = new RegexIterator($ite, $pattern, RegexIterator::GET_MATCH);
    
    foreach($files as $file) {
        echo 'found matching file: ' . $file[0] . PHP_EOL;
    }
    

    the folder structure:

    data
    |-- 00256_FRGHSD02D5102T0013005878.TXT
    |-- example.TXT
    `-- test
        `-- YES256_FRGHSD02D5102T0013005878.TXT
    

    the result:

    found matching file: /Users/stackoverflow/dev/data/00256_FRGHSD02D5102T0013005878.TXT
    found matching file: /Users/stackoverflow/dev/data/test/YES256_FRGHSD02D5102T0013005878.TXT
    

    When searching for an specific extension the following snippet can be used:

    .pdf

    $pattern = '/^.*(FRGHSD02D5102T|005878|001|002).*\.pdf$/';
    

    .txt

    $pattern = '/^.*(FRGHSD02D5102T|005878|001|002).*\.TXT$/';
    

    .pdf, .PDF, .PdFm, contains 001 and 002 OR 002 and 001

    $pattern = '/^.*(FRGHSD02D5102T|005878|001.*002|002.*001).*\.pdf/i';
    

    matches:

    data
    |-- 00256_FRGHSD02D5102T0013005878.TXT ❌
    |-- example.TXT ❌
    |-- hell001hello.pdf ❌
    |-- hell001hello002.pdf ✅
    |-- hell002hello001.pdf ✅
    `-- test
        `-- YES256_FRGHSD02D5102T0013005878.TXT ❌
    
    

    The /i makes it case-insensitive so it will match any casing of PDF.

    The \. escapes the . because we need to match the literal . instead of matching all characters.