Search code examples
csvcygwin

Merge CSV files in multiple sub directories using Cygwin


I'm new to Cygwin and could really use some help. I have a root directory with multiple sub directories (all sub directories are on the same level, that is no sub-sub directories etc). Each sub directory contains several CSV files (same format, no headers). I'd like to merge the CSVs in each sub directory into one large CSV file for each sub directory. That is, one CSV per sub directory, containing the contents of all CSVs in that individual sub directory.

I think I can use the simple command cat *.csv > largefile.csv, but I'm not so sure how to scan through all the sub directories and apply this code to each one. Based on the tutorials I've worked through, I believe this should work:

for dir in `find . -type d`
do cat *.csv > largefile.csv
done

Is this the best approach? It seems way too simple.

Also, is there a way to store these commands in a file that I could execute whenever I need to perform this task?

Thanks in advance for helping this beginner out!


Solution

  • I would do it by creating this file:

    cat_all_csv_dir.sh:

    #!/bin/bash
    for dir in *; do
        # continue if it is not a directory or if it does not contain any csv
        if [ ! -d "$dir" ] || [ -z "$(ls "$dir"/*.csv 2>/dev/null)" ]; then
            continue;
        fi
        cat "$dir"/*.csv > "$dir".csv
        echo $dir
    done
    

    You should create it in a folder, which is present in PATH environment variable. Its value could be printed by typing echo $PATH command. By this way cat_all_csv_dir.sh could be run from any directory.