Search code examples
linuxbashshellsymlinkln

get unique root links from a directory of symlinks


I have a largish directory filled with symlinks (created using ln-s) - about 1million of them. They look like so:

--img_dir
  -- img.jpg --> /path/to/some/img.jpg
  -- imgc.jpg --> /path/to/some/imgc.jpg
  -- imgd.jpg --> /path/to/some/imgd.jpg
  -- img2.jpg --> /path2/to2/some2/img2.jpg
  -- img3.jpg --> /path3/to3/some3/img3.jpg
  -- img21.jpg --> /path21/to21/some21/img2.jpg
  -- img31.jpg --> /path31/to31/some31/img3.jpg
<snip>

for record keeping purposes, I would like a list of unique base_dirs (the root directories) from which the symlinks have been created.

So, I would like the following output:

/path/to/some
/path2/to2/some2
/path3/to3/some3
/path21/to21/some21
/path31/to31/some31

I tried googling around to see how one can achieve this in bash but I am not able to find anything useful..

Any help or pointers would be much appreciated.


Solution

    • find can list symlinks
    • realpath turns symlinks into absolute paths
    • dirname strips final component from a path
    • sort sorts lines and can dedupe
    find img_dir -type l | xargs realpath | xargs dirname | sort -u
    

    Or, logging errors:

    find img_dir -type l 2>find-errs     |
    xargs realpath       2>realpath-errs |
    xargs dirname        2>dirname-errs  |
    sort -u               >basedir-list
    

    Some implementations of realpath and dirname may only allow a single argument. In that case, do

    ... | xargs -I@ realpath @ | xargs -I@ dirname @ | ...
    

    The code above assumes no really wierd paths (eg. mustn't contain newlines).