Search code examples
apiurlsedflickr

Convert XML file with 9 unique fields per line to pre-formed URLs using only 4 of field per line


I'm working on an API call with Flickr, that returns results per photo like:

<photo id="7503362468" owner="59044395@N02" secret="66b94027db" server="8423" farm="9" title="Potluck" ispublic="1" isfriend="0" isfamily="0" />

Now, according to Flickr's URL/API documentation, their URLs are structured like this, with the mstzb's being one-letter indicators of the size of the photo:

http://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}_[mstzb].jpg

So, my question has to do with a mass search and replace that can take each line, prepend the http://farm and then basically just "fill in the blanks" for the rest. The goal would be to use the API to fetch a restful XML that I can then throw the replacer at and have a list of URLs get generated. I have a brief familiarity with sed - admittedly no wizard at it - but I'm just unsure of how to do a search and replace per line, that prepends, then replaces in the proper order. Of course, the farm-id is the first to go into the URL, and is the fifth field in the XML - what I mean is the search and replace pattern follows the same locations for each line. Admittedly, again, I'm just getting started with regex-type stuff and any help would be appreciated. I also see that this sort of question has been asked before, but they seemed to be focused on how to create URL syntax rather than a sed-style replace. Like I said, my sed knowledge is more based around simple s/unnecessary/necessary - I am just unsure of how to pick out certain quoted fields and move them into a preformed line.

edit: A little more info - I'm using Flickr's API Explorer to generate these XML files, and typically work with bash for editing. I think what I am after here is more along the lines of a bash script or possibly even a piece of (hopefully) executable programming language. I will hasten to add that although I do have a 'little' familiarity working with languages like python, I have zero to no experience with writing code aside from bash scripts. You can check out the API Explorer here: http://www.flickr.com/services/api/explore/?method=flickr.photos.search

Thanks y'all!


Solution

  • Three solutions using awk:

    Solution 1. Assumes that every xml record looks like the sample given, with all the fields in exactly the sample's sequence:

    The double quote is set as the field delimiter, then the desired content is accessed as positional variables within the input line.

    A file could have many input records and all will be converted in one execution.

    #!/usr/bin/awk -f
    #<photo id="7503362468" owner="59044395@N02" secret="66b94027db" server="8423" farm="9" title="Potluck" ispublic="1" isfriend="0" isfamily="0" />
    #1          2          3       4            5        6          7        8    9      `10 11     12      13         14 15        16 17        18 19
    #http://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}_[mstzb].jpg
    
    #usage ./xml2url.awk <file_of_xml_text
    BEGIN {FS="\""}
    {print "http://farm"$10".staticflickr.com/"$8"/"$2"_"$6"_[mstzb].jpg"}
    

    Solution 2. This solution assumes you can edit the xml, replacing

    <photo
    

    with

    usage echo x|./xml2urlv2.awk
    

    and replacing

    />
    

    with nothing.

    Then

    #!/usr/bin/awk -f
    # usage echo x|./xml2urlv2.awk  id="7503362468"  owner="59044395@N02"  secret="66b94027db"  server="8423"  farm="9" title="Potluck" ispublic="1"  isfriend="0"  isfamily="0"
    #<photo id="7503362468" owner="59044395@N02" secret="66b94027db" server="8423" farm="9" title="Potluck" ispublic="1" isfriend="0" isfamily="0" />
    #http://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}_[mstzb].jpg
    #
    {print "http://farm"farm".staticflickr.com/"server"/"id"_"secret"_[mstzb].jpg"}
    

    does the trick.

    Solution 3. This solution eliminates the need to echo anything into the script, but requires more editing. You have to put -v before each field that you care about.

    #!/usr/bin/awk -f
    #<photo id="7503362468" owner="59044395@N02" secret="66b94027db" server="8423" farm="9" title="Potluck" ispublic="1" isfriend="0" isfamily="0" />
    #http://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}_[mstzb].jpg
    
    #usage: ./xml2urlv.awk -v id="7503362468" -v owner="59044395@N02" -v secret="66b94027db" -v server="8423" -v farm="9" -v title="Potluck" -v ispublic="1" -v isfriend="0" -v isfamily="0"  
    
    BEGIN{print "http://farm"farm".staticflickr.com/"server"/"id"_"secret"_[mstzb].jpg"}
    ### end of script 
    

    if you are new to awk, remember that the entire print statement must go on one line. Also, the { must go on the line with the word BEGIN.