Search code examples
bashfindcut

How to rev & cut (using the same pattern) a list of strings in a single line?


I intend to write a script that gathers files based on their filename prefix, and tar them together (when they share the same prefix). I have no list of the prefix, and I need to build it from the filenames themselves.

Files have names like:

top-1.parquet
top-2.parquet
side-1.parquet
side-2.parquet
bot-tom-1.parquet
bot-tom-2.parquet
right-left-1.parquet
right-left-2.parquet

To do so, I started with this script.

RMT_PATH_DATA='/home/me/Documents/code/data'

while IFS= read -r -d $'\n' root_name
do
    # Work out tar here
    echo "Working file $root_name"
    ls "$root_name"*.parquet
done < <(find "$RMT_PATH_DATA" -maxdepth 1 -name "*.parquet" -print0 | rev | cut -f 2- -d '-' | rev | sort -zu)

(this script is more or less copied from the retained answer here on SO)

The logic of the last line is to revert the list of filenames retrieved with find, and trim the digit of the filename and the prefix. The trimming is made by first reversing the filename, using cut starting on the 2nd field on reversed name (- is the field delimiter, and it can be used a variable number of times in the prefix itself).

My trouble appears with the rev and cut commands. The find commands outputs the list of parquet files in the data directory, but rev and cut appear processing only the 1st item of the list, discarding the other items.

Please, how can I make them processing the full list?

Thanks for your help! Bests

PS: I have not built the tar part yet, and only do an echo and ls to check what is being processed in the loop. Only one iteration is currently done because of the raised trouble.


Solution

  • The problem is the -print0 option that you use in find. Then the delimiter between the found items is the NUL and not the newline. In How to concatenate files that have the same beginning of a name? the have used cut with the -z option, which is the corresponding of -print0. The rev command does not have an option to use the NUL delimiter as far as I can see.