Download zip files via wget

I like to play chess and would like to download the games of the Grandmasters starting from Mon 25th Jun 2012 until today and continuously every week, on Monday from the internet as zip file. The zip files are freely available. The zip files have names ordered by a number e.g. twic920g.zip - twic1493g.zip. The next week the number increases by 1 to twic1494g.zip. For the first run this script works.

Here are my questions:

how do I increase the counter by plus 1 every week?
when unpacking, the locally saved zip data is also unpacked again and not only the alktuell downloaded file. With the cat command the old and new files are merged. So the master.pgn has the games twice.

#!/bin/bash

dir="pgn/zip"

if [[ ! -d $dir ]]; then
    mkdir -p $dir
fi

cd $dir

# Download all PGN files
for i in {920..1493}; do
    wget -nc  https://www.theweekinchess.com/zips/twic"$i"g.zip
    unzip twic"$i"g.zip
    cat twic"$i".pgn >> ../master.pgn
    rm twic"$i".pgn
done

Solution

how do I increase the counter by plus 1 every week?

I think once you've downloaded the historic games you don't need to worry about incrementing a counter: you can get the link for the "current" game by parsing content from https://theweekinchess.com/zips/.

A more robust solution would probably require something other than a shell script, but this works:

curl https://theweekinchess.com/zips/ | grep 'twic[0-9]*g.zip' | cut -f2 -d'"'

For example, running that right now produces:

http://www.theweekinchess.com/zips/twic973g.zip

Just run a script to download the latest archive once a week (e.g., using cron).

Alternately, you could write the number of the last file downloaded successfully to a file, and use that as the starting value next time it runs:

#!/bin/bash

dir="pgn/zip"

if [[ ! -d $dir ]]; then
mkdir -p $dir
fi

cd $dir

# figure out number of last successfully fetched game
last_fetched=$(cat last_fetched 2> /dev/null || echo 0)

if (( last_fetched == 0 )); then
    first=920
else
    first=$(( last_fetched + 1 ))
fi

echo "starting with: $first"

# Download all PGN files
for (( i=first; 1; i++ )); do
    # don't download a file if it already exists
    [[ -f "twic${i}g.zip" ]] && continue

    echo "fetching game $i"
    curl -sSfLO  "https://www.theweekinchess.com/zips/twic${i}g.zip" || break
    echo "$i" > last_fetched
    unzip -p twic"$i"g.zip >> ../master.pgn
done

when unpacking, the locally saved zip data is also unpacked again and not only the ... downloaded file. With the cat command the old and new files are merged. So the master.pgn has the games twice.

I'm not sure what you're saying here. You're only unpacking the file you've just downloaded, so any existing zip files shouldn't matter.

Instead of appending to master.pgn in every loop iteration, you could leave the unpacked files on disk and completely regenerate master.pgn at the end of the script:

for (( i=first; 1; i++ )); do
    # don't download a file if it already exists
    [[ -f "twic${i}g.zip" ]] && continue

    echo "fetching game $i"
    curl -sSfLO  "https://www.theweekinchess.com/zips/twic${i}g.zip" || break
    echo "$i" > last_fetched
    unzip twic"$i"g.zip
done

cat *.pgn > ../master.pgn