Search code examples
xmlfileurldownload

How to download multiple files whose URLs are stored in an xml document?


I have an xml document on my computer that basically looks like this:

<item playlist="3" gameid="32" catid="1" title="Cul-De-Sac of Memories" artist="Christopher Lennertz" scr="../mp3/sims3p/build/cul-de-sac_of_memories.flv" />

<item playlist="3" gameid="30" catid="4" title="Brave" artist="Kelis" scr="../mp3/sims3ln/electronica/brave.flv" />

<item playlist="3" gameid="15" catid="1" title="First Volley" artist="General Midi" scr="../mp3/sims2nl/build/general_midi_-_first_volley.flv" />

Except that it has much more items than that (and some comments). I've been desperately trying to find a way to get a program/script to:

  1. Take the url in between src=" and the very next " in the xml tag.
  2. Replace the ../ in the url with http://www.WEBSITE.com/ and maybe store it as a variable, like Song_URL.
  3. Take the song name in between title=" and the very next " from the same tag it got the url from, and maybe store that a variable too, like Song_Name.
  4. Download the song from the Song_URL and name it Song_Name.

For every tag like that in the document. Note that some tags in the document look like this: <item playlist="2" gameid="28" catid="2" title="Load" /> and don't matter to me.

I know a tiny bit of Bash, Applescript, and Python, but don't know enough of any to do this. If anyone could please help me do this, in whatever programming language you please (it could be in the 3 I listed, or in Java, Ruby, C or whatever else you want), however you want, I would very much appreciate it!


Solution

  • I figured it out how to do this with help from a friend. After using a basic text program to replace all instances of ../ with http://www.WEBSITE.com/ I used the following program to download the songs:

    import urllib
    F = open('/PATH/TO/FILE.txt')
    document = F.readlines()
    for string in document:
        index1 = string.find('scr="')+5
        index2 = string.find('"',index1)
        Song_url = string[index1:index2]
        
        index3 = string.find('title="')+7
        index4 = string.find('"',index3)
        Song_name = string[index3:index4]
        
        u = urllib.urlopen(Song_url)
        localFile = open((Song_name + '.flv'),'w')
        localFile.write(u.read())
        localFile.close()
    

    And it worked like a charm.