bash shell find command-line-interface wc

Count number of distinct folders in find result

I am running a process producing arbitrary number of files in an arbitrary number of sub-folders. I am interested in the number of distinct sub-folders and currently am trying to solve this with bash and find (I do not want to use a scripting language).

So far I have:

find models/quarter/ -name settings.json | wc -l

However, this obviously does not consider the structure of the result from find and just counts all files returned.

Sample of the find return:

models/quarter/1234/1607701623/settings.json
models/quarter/1234/1607701523/settings.json
models/quarter/3456/1607701623/settings.json
models/quarter/3456/1607702623/settings.json
models/quarter/7890/1607703223/settings.json

I am interested in the number of distinct folders in top-folder models/quarter, so the appropriate result for the sample above would be 3 (1234, 3456, 7890). It is a requirement that the folders to be counted contain a sub-folder (which is a Unix timestamp as you might have recognized) and the sub-folder contains the file settings.json.

My guts tell me it should be possible, e.g. with awk, but I am certainly no bash pro. Any help is greatly appreciated, thanks.

Solution

find models/quarter/ -name settings.json | awk -F\/ '{ if (strftime("%s",$4) == $4) { fil[$3]="" } } END { print length(fil) }'

Using awk. Pass the output of find to awk and set / as the field separator. Check that the 4th field is a timestamp and then if it is, create an array with the third field as the index. At the end, print the length of the array fil.