I like to play chess and would like to download the games of the Grandmasters starting from Mon 25th Jun 2012 until today and continuously every week, on Monday from the internet as zip file. The zip files are freely available. The zip files have names ordered by a number e.g. twic920g.zip - twic1493g.zip. The next week the number increases by 1 to twic1494g.zip. For the first run this script works.
Here are my questions:
#!/bin/bash
dir="pgn/zip"
if [[ ! -d $dir ]]; then
mkdir -p $dir
fi
cd $dir
# Download all PGN files
for i in {920..1493}; do
wget -nc https://www.theweekinchess.com/zips/twic"$i"g.zip
unzip twic"$i"g.zip
cat twic"$i".pgn >> ../master.pgn
rm twic"$i".pgn
done
how do I increase the counter by plus 1 every week?
I think once you've downloaded the historic games you don't need to worry about incrementing a counter: you can get the link for the "current" game by parsing content from https://theweekinchess.com/zips/.
A more robust solution would probably require something other than a shell script, but this works:
curl https://theweekinchess.com/zips/ | grep 'twic[0-9]*g.zip' | cut -f2 -d'"'
For example, running that right now produces:
http://www.theweekinchess.com/zips/twic973g.zip
Just run a script to download the latest archive once a week (e.g., using cron
).
Alternately, you could write the number of the last file downloaded successfully to a file, and use that as the starting value next time it runs:
#!/bin/bash
dir="pgn/zip"
if [[ ! -d $dir ]]; then
mkdir -p $dir
fi
cd $dir
# figure out number of last successfully fetched game
last_fetched=$(cat last_fetched 2> /dev/null || echo 0)
if (( last_fetched == 0 )); then
first=920
else
first=$(( last_fetched + 1 ))
fi
echo "starting with: $first"
# Download all PGN files
for (( i=first; 1; i++ )); do
# don't download a file if it already exists
[[ -f "twic${i}g.zip" ]] && continue
echo "fetching game $i"
curl -sSfLO "https://www.theweekinchess.com/zips/twic${i}g.zip" || break
echo "$i" > last_fetched
unzip -p twic"$i"g.zip >> ../master.pgn
done
when unpacking, the locally saved zip data is also unpacked again and not only the ... downloaded file. With the cat command the old and new files are merged. So the master.pgn has the games twice.
I'm not sure what you're saying here. You're only unpacking the file you've just downloaded, so any existing zip files shouldn't matter.
Instead of appending to master.pgn
in every loop iteration, you could leave the unpacked files on disk and completely regenerate master.pgn
at the end of the script:
for (( i=first; 1; i++ )); do
# don't download a file if it already exists
[[ -f "twic${i}g.zip" ]] && continue
echo "fetching game $i"
curl -sSfLO "https://www.theweekinchess.com/zips/twic${i}g.zip" || break
echo "$i" > last_fetched
unzip twic"$i"g.zip
done
cat *.pgn > ../master.pgn