Search code examples
linuxcsvsftplftp

Download all csv files from Subdirectories that match across differnet directories


On the first day of each month I need to connect to an SFTP server and download all csv files from certain sub directories based on last month.

Example File directories to connect to

sftp_url/csv/client1.1/10/
sftp_url/csv/client1.2/10/
sftp_url/csv/client1.3/10/
sftp_url/csv/client1.4/10/
sftp_url/csv/client2.1/10/
sftp_url/csv/client2.2/10/
sftp_url/csv/client2.3/10/
sftp_url/csv/client2.4/10/

The "10" in the sub directory refers to month so is "October". Within the /10/ sub directory there are multiple csv files and I need to have them all downloaded.

  • I have figured out the code to connect to the SFTP server with lftp.
  • I have code to determine the "10" which is date -d "last month" +"%m"

However I haven't been able to find out how to define that I only want to access all the directories where the folder == the value for last month, without listing all of the full file paths.

Can someone confirm if there is an easy command that allows for this kind of thing? Apologies if this is something super straight forward. I'm new to command line and it's been a steep curve. Appreciate any help and feedback you can provide.


Solution

  • #!/bin/sh
    
    set -eu
    
    SFTP_SERVER="sftp.location.com"
    SFTP_USER="user"
    SFTP_DIR="/csv"
    
    DEST=$(basename $SFTP_DIR) ## defines directory based on where my current directory is and the sftp folder name
    
    mkdir -p "$DEST" ## makes new directory based on basename and the month found in the downloaded folder
    
    lftp sftp://$SFTP_USER@$SFTP_SERVER:$SFTP_DIR/ -e "lcd '$DEST'; mirror; bye"
    
    ACCT_BREAKDOWNS=$(find "$DEST" -mindepth 1 -maxdepth 1 -type d) ## defines the month folders based on going 1 folder in from the sftp folder name
    YEAR=$(TZ=UTC-24 date +%Y -d "-1 month") ## defines the year for the folders created, ensuring that it creates a new folder when we enter the new year
    for ACCT_BREAKDOWN in $ACCT_BREAKDOWNS; do
            ACCT=$(basename "$ACCT_BREAKDOWN" | sed -E -e 's/_(.2|.1|.3)$//') ## strips folders to just show client name
    
            DATES=$(find "$ACCT_BREAKDOWN" -mindepth 1 -maxdepth 1 -type d -printf "%f\n") ## prints out a list of client names, with a separate one for each line
            #DATES="09 10" ## would be used to specify clear dates
            for DATE in $DATES; do
                    mkdir -p "account/$ACCT/$YEAR-$DATE" ##create directory based on the client name after filename is stripped)
    
                    for FILE in $(find "$ACCT_BREAKDOWN/$DATE" -type f); do ##  works on files that are in the sftp folder and date sub folder.
                            ln -f -t "account/$ACCT/$YEAR-$DATE" "$FILE" ## creates a hard link between the csv folder and account folder, placing files in a folder that match their client name and the right month.
                    done
            done
    done
    
    rclone sync account/ remote:folder/folder ## syncs folder layout to gdrive
    
    TODAY=$(date)
    echo "$TODAY    global  script" > ~/error-log/log/log.tsv ## creates a tsv which is used a reference to what failed on my cronjobs
    
    exit