I'm working on a system that stores something very similar to a file system's inode tree. It already has an equivalent of the ls
command, but it does not yet support a recursive option. I'm investigating implementation choices for adding the recursive option. I'd like to maximize familiarity for users that know POSIX ls
and maximize portability for any scripts written to consume the output of POSIX ls -R
.
It seems ls -R
could be implemented with either depth-first traversal or breadth-first traversal. However, I haven't been able to find a definitive answer on whether a particular traversal order is dictated by specification or if it's left as an implementation choice.
In the POSIX documentation for ls
, I cannot find any specific answer. This is the only statement I could find related to the implementation of the recursion:
Implementations are expected to traverse arbitrary depths when processing the -R option. The only limitation on depth should be based on running out of physical storage for keeping track of untraversed directories.
I also tried reviewing the documentation for nftw
. Again, I found no specific statement of the traversal order there.
For some empirical measurement, I ran a test on a CentOS box, and the behavior there is clearly depth-first traversal.
> uname -a
Linux centos 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
> yum info coreutils
Loaded plugins: fastestmirror
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Loading mirror speeds from cached hostfile
* base: mirror.pac-12.org
* elrepo: ftp.osuosl.org
* epel: linux.mirrors.es.net
* extras: repos.lax.quadranet.com
* updates: ftp.osuosl.org
Installed Packages
Name : coreutils
Arch : x86_64
Version : 8.22
Release : 11.el7
Size : 14 M
Repo : installed
From repo : anaconda
Summary : A set of basic GNU tools commonly used in shell scripts
URL : http://www.gnu.org/software/coreutils/
License : GPLv3+
Description : These are the GNU core utilities. This package is the combination of
: the old GNU fileutils, sh-utils, and textutils packages.
> tree testTraversal/
testTraversal/
├── dir1
│ └── dir8
│ └── dir9
│ └── dir10
├── dir2
│ ├── dir4
│ │ └── dir5
│ ├── dir6
│ ├── file1
│ └── file2
├── dir3
└── dir4
└── dir5
└── dir6
└── dir7
├── file3
├── file4
└── file5
> ls -R testTraversal/
testTraversal/:
dir1/ dir2/ dir3/ dir4/
testTraversal/dir1:
dir8/
testTraversal/dir1/dir8:
dir9/
testTraversal/dir1/dir8/dir9:
dir10/
testTraversal/dir1/dir8/dir9/dir10:
testTraversal/dir2:
dir4/ dir6/ file1 file2
testTraversal/dir2/dir4:
dir5/
testTraversal/dir2/dir4/dir5:
testTraversal/dir2/dir6:
testTraversal/dir3:
testTraversal/dir4:
dir5/
testTraversal/dir4/dir5:
dir6/
testTraversal/dir4/dir5/dir6:
dir7/
testTraversal/dir4/dir5/dir6/dir7:
file3 file4 file5
I don't know if this is behavior dictated by spec or just an implementation detail of GNU coreutils.
My own observation is that directory structures in a file system tend to be wider than they are deep. That would indicate that depth-first is generally the more memory-efficient implementation choice, although it is possible to come up with counter-examples where breadth-first would be more memory-efficient.
Is the traversal order dictated anywhere in the specs? If not, then is depth-first traversal widely used in implementation, and therefore a safer assumption than breadth-first?
coreutils is indeed depth first. busybox is depth first. BSD/OS X is depth first (experimentally; the source is unreadable). I would expect most implementations to be depth first, because it's easy to do recursively and because the POSIX limit on path lengths bounds the memory/stack usage of depth first traversal pretty tightly.