Search code examples
regexfilenames

Regex - Filter for atypical filetypes


I have a folder filled with plain text files with filenames formatted as follows:

00001.7c53336b37003a9286aba55d2945844c
00002.9c4069e25e1ef370c078db7ee85ff9ac
00003.860e3c3cee1b42ead714c5c874fe25f7
00002.d94f1b97e48ed3b553b3508d116e6a09
00001.7848dde101aa985090474a91ec93fcf0

After I acquire the filenames as strings, how can I filter them so that all relevant files are accepted and everything else is rejected?

  • I could reformat all files in a controlled environment to strip the string up to the the ., then add another . and a constant filetype.

  • I could try to set a fixed acceptable value for the length of the string after the ..

  • I could exclude some specific filetypes and hope nothing else slips through.

All this methods require me to rename the files or make sure in first person that there is nothing else in the folder.


Solution

  • The files all have a very long extension. You could use the following to select files which have exactly 32 character extension.

    \.[^.]{32}$
    

    Or something like

    \.[^.]{8,}$
    

    Which matches files whose extension is at least 8 characters.

    A close look reveals that (at least) in your example the only alphabetic characters are a, b, ..., f so you could restrict your search more with:

    \.[0-9a-f]{8,}$
    

    Also in all the example the file name has exactly 5 digits and start with (at least) double 0 which we could incorporate with:

    ^0{2}\d{3}\.[0-9a-f]{8,}$