Search code examples
regexabsolute-path

REGEX - Match all the absolute path with more than one subfolder


i have a series of string representing disks on a file system. I would like to match all the strings with more than one subfolder. Ex:

/dev/sda1 (no match)
/         (no match)
/dev      (no match)

/dev/mapper/usr (match)
/dev/test/local (match)

I've tried something like:

^\/[^\/]+\/[^\/]+\/[^\/]+$

But would like to have something more generic that matches all the strings that have this pattern /.* more than twice. Is there a more elegant way to approach this ?

Unfortunately the solution proposed by @Wiktor Stribiżew, while working on regex 101, does not seems to solve the problem.

To further expand on the topic i have this output from df:

...
cgroup                  0        0         0    - /sys/fs/cgroup/hugetlb
cgroup                  0        0         0    - /sys/fs/cgroup/devices
mqueue                  0        0         0    - /dev/mqueue
/dev/vda9        28056816 12475936  14381828  47% /hostroot
/dev/mapper/usr   1007760   880440     75304  93% /hostroot/usr
/dev/vda6          110576       96    101308   1% /hostroot/usr/share/oem
sysfs                   0        0         0    - /hostroot/sys
/dev/vda1          130798   106402     24396  82% /hostroot/boot
...

and this pattern for awk:

PATTERN='!/loop/ && /^\// && !/^(\/[^\/]+){3,}$/ {printf \
"\"'${NAME}'\":\"%s\","\
"\"mount\":\"%s\","\
"\"total\":%d,"\
"\"used\":%d,"\
"\"free\":%d,"\
"\"percentage\":%.2f\n", $1, $6, $2, $2-$4, $4, ($2-$4)/($2+1)*100}'

when launching the command: df -a | awk "$PATTERN"

i would expect to get this output:

"device":"/dev/vda6","mount":"/hostroot/usr/share/oem","total":110576,"used":9268,"free":101308,"percentage":8.38
"device":"/dev/vda1","mount":"/hostroot/boot","total":130798,"used":106402,"free":24396,"percentage":81.35
"device":"/dev/vda9","mount":"/docker-volumes.d/state","total":28056816,"used":13675036,"free":14381780,"percentage":48.74

But instead the /dev/mapper/usr does not get filtered out.

"device":"/dev/mapper/usr","mount":"/hostroot/usr","total":1007760,"used":932456,"free":75304,"percentage":92.53
"device":"/dev/vda6","mount":"/hostroot/usr/share/oem","total":110576,"used":9268,"free":101308,"percentage":8.38
"device":"/dev/vda1","mount":"/hostroot/boot","total":130798,"used":106402,"free":24396,"percentage":81.35
"device":"/dev/vda9","mount":"/docker-volumes.d/state","total":28056816,"used":13675036,"free":14381780,"percentage":48.74

any guess on why?


Solution

  • You can use

    ^(\/[^\/]+){3,}$
    

    Details:

    • ^ - start of string
    • (\/[^\/]+){3,} - three or more sequences of a / char followed with one or more (but as many as possible) chars other than /
    • $ - string end.

    See the regex demo.

    NOTE: The pattern above is POSIX ERE and PCRE compliant. When possible, in non-POSIX regex flavors, it is best to use non-capturing groups when you only use a grouping construct to quantify a pattern sequence. So, if you were to use it in JavaScript, I'd recommend using /^(?:\/[^\/]+){3,}$/. Also, if you even need to use it in POSIX BRE, you'd need ^\(\/[^\/]\{1,\}\)\{3,\}$.