Search code examples
xmlxsltcontent-disposition

How to transform url with inline/attachment to url with file extension in xsl?


With a help of xsl script I extract the url to a file from XML. The ending of this url is: api/v1/objects/uuid/b79de4e5-8d1f-4840-b85f-e052db92a52f/file/id/1001974122/file_version/name/small/disposition/inline

When I enter this url in web browser it will be transformed to URL with file extension at the ending eas/partitions-inline/48/1001/1001974000/1001974122/9a4191c7ce7414650d36ac9bc1c2b012261013ad/image/png/8223@33a8cae1-a9fa-4655-8c3d-b71241bbc99b_1001974122_small.png

Is there a way to do this transformation with xsl without a browser?

I need the url with a file extension in my output xml in order to run the harvester over it.


Solution

  • The question is very informal about the URL transformation (and the XML tooling used), but let's assume 3xx response to original URL and the intent to output the result URL. For instance:

    $ curl --silent --head http://stackoverflow.com | grep Location
    Location: https://stackoverflow.com/
    

    To to do the same thing while transforming XML the XSLT processor needs to have a HTTP client. There is HTTP Client module in EXPath, collection of XPath extension specification with implementations.

    To quickly install EXPath there's installer available on download page. It comes with Saxon XSLT processor. At the time of writing it refers to expath-repo-installer-0.13.1.jar. Run it like:

    java -jar expath-repo-installer-0.13.1.jar
    

    Once installed download the HTTP client module for Saxon, expath-http-client-saxon-0.12.0.zip and extract expath-http-client-saxon-0.12.0.xar out of it. Then install it to EXPath repository:

    mkdir repo
    bin/xrepo --repo repo install /path/to/expath-http-client-saxon-0.12.0.xar
    

    Then you can use bin/saxon.

    data.xml

    <?xml version="1.0" encoding="utf-8"?>
    <data>
      <datum><url>http://python.org</url></datum>
      <datum><url>http://stackoverflow.com</url></datum>
    </data>
    

    text.xslt

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet 
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:http="http://expath.org/ns/http-client"
      exclude-result-prefixes="#all"
      version="2.0">
    
      <xsl:import href="http://expath.org/ns/http-client.xsl"/>
      <xsl:output method="xml" encoding="utf-8" indent="yes"/>
    
      <xsl:template match="/">
        <result>
          <xsl:for-each select="data/datum">
            <!-- the request element -->
            <xsl:variable name="request" as="element(http:request)">
              <http:request method="head" follow-redirect="false">
                <xsl:attribute name="href">
                  <xsl:value-of select="url"/>
                </xsl:attribute> 
              </http:request>
            </xsl:variable>
            <!-- sending the request -->
            <xsl:variable name="response" select="http:send-request($request)"/>
            <!-- output -->
            <url>
              <orig><xsl:value-of select="url"/></orig>
              <location>
                <xsl:value-of 
                  select="$response[1]/header[@name='location']/@value"/>
              </location>
            </url>
          </xsl:for-each>  
        </result>
      </xsl:template>
    </xsl:stylesheet>
    

    See the module's spec for more details about how to control the HTTP client.

    Then bin/saxon --repo repo data.xml test.xslt produces:

    <?xml version="1.0" encoding="utf-8"?>
    <result>
       <url>
          <orig>http://python.org</orig>
          <location>https://python.org/</location>
       </url>
       <url>
          <orig>http://stackoverflow.com</orig>
          <location>https://stackoverflow.com/</location>
       </url>
    </result>