I am migrating a shop over for a client.
I have to pull all the old image files off her 'shop' which has no FTP access.
It allowed me to export a list of filenames/urls. My plan was to load them up in Firefox and use "Downloadthemall" to simply download all the files. (Around 2000). However about 1 1/3 have [ and ] in.
i.e.
cdn.crapshop.com/images/image[1].jpg
Downloadthemall freaks out and only reads it as
cdn.crapshop.com/images/image
And won't download it because it isn't a file.
Anyone got any ideas of an alternative way to pull a list like this?
See this solution that explains why the example URL you provided is invalid: Validation. After you look at that post you'll see that, in the answer provided by @good, you have to encode characters that are not according to the specification using percent encoding, so the webserver will understand them.
This calls for python... see this post: Percent encoding in python
And then we can put it all together in a script, which you will use to read from stdin and output to stdout: python script.py < input > output.out
.
import urllib, sys
while 1:
try:
line = sys.stdin.readline()
except KeyboardInterrupt:
break
if not line:
break
print urllib.quote(line.strip(), safe=':').strip('\'')
Then, hopefully, download them all will parse that list of files (the input to that script is supposed to be a list of url's separated by a newline) that have been corrected by the script.
You may be interested in this post as well: Downloading files with python. Which shows you how to download files (web pages in particular) using python.
Good luck!