Search code examples
xmlxsltxpathgraphml

XSL - How to create graphml edge from XML to connect node with same author/actors?


I have an XML file that shows a list of movies. Each movie has some metadata to describe the plot, actors, directors, etc. This is the example structure:

<movies>
    <movie>
    <title>The Shawshank Redemption</title>
    <year>1994</year>
    <rated>R</rated>
    <released>1994 Oct 14</released>
    <runtime>142 min</runtime>
    <genres>
        <genre>Crime</genre>
        <genre>Drama</genre>
    </genres>
    <directors>
        <director>Name Surname</director>
    </directors>
    <writers>
        <writer>Stephen King (short story 'Rita Hayworth and Shawshank Redemption')</writer>
        <writer>Frank Darabont (screenplay)</writer>
    </writers>
    <actors>
        <actor>Tim Robbins</actor>
        <actor>Morgan Freeman</actor>
        <actor>Bob Gunton</actor>
        <actor>William Sadler</actor>
    </actors>
    <plot>Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.</plot>
    <languages>
        <language>English</language>
    </languages>
    <countries>
        <country>USA</country>
    </countries>
    <awards>Nominated for 7 Oscars. Another 16 wins and 16 nominations.</awards>
    <poster>http://ia.media-imdb.com/images/M/MV5BODU4MjU4NjIwNl5BMl5BanBnXkFtZTgwMDU2MjEyMDE@._V1_SX300.jpg</poster>
    <metascore>80</metascore>
    <imdbRating>9.3</imdbRating>
    <imdbVotes>1358212</imdbVotes>
    <imdbID>tt0111161</imdbID>
    <type>movie</type>
    </movie>
    <movie>
    ...
    </movie>
    <movie>
    ...
    </movie>
    ...
</movies>

I have to create an XSL stylesheet to transform this file in a graphml file that shows the actors relations with the movies, where nodes are the movies and the edges between two nodes exists if an actor appears in the movies (nodes) connected. Here an example:

<key id="actors" for="edge" attr.name="actors" attr.type="int">
    <default>1</default>
</key>

<graph id="movies" edgedefault="undirected">

<node id="movie title 1"/>
<node id="movie title 2"/>
<node id="movie title 3"/>
...

<edge source="movie title 1" target="movie title 2">
    <data key="actors">2</data> (number of actors who appear in both "movie title 1" and "movie title 2")
</edge>

This is a fragment of XSL to list the nodes:

<xsl:for-each-group select="/movies/movie" group-by=".">
    <xsl:sort select="current-grouping-key()"/>         
    <node><xsl:attribute name="id"><xsl:value-of select="current-grouping-key()"/></xsl:attribute></node>
    <xsl:text>&#xa;</xsl:text>
</xsl:for-each-group>
<xsl:text>&#xa;</xsl:text>

Thanks in advance for the answers.


Solution

  • I don't think your question is very clear. If - as it seems - you want a graph connecting movies with the same actors, then you should start with an example that (a) has multiple movies, and (b) some of which have the same actors:

    XML

    <movies>
       <movie>
          <title>Alpha</title>
          <actors>
             <actor>Adam</actor>
             <actor>Betty</actor>
             <actor>Cecil</actor>
          </actors>
       </movie>
       <movie>
          <title>Bravo</title>
          <actors>
             <actor>Adam</actor>
             <actor>Betty</actor>
             <actor>David</actor>
          </actors>
       </movie>
       <movie>
          <title>Charlie</title>
          <actors>
             <actor>Adam</actor>
             <actor>David</actor>
             <actor>Eve</actor>
          </actors>
       </movie>
       <movie>
          <title>Delta</title>
          <actors>
             <actor>Cecil</actor>
             <actor>Eve</actor>
          </actors>
       </movie>
       <movie>
          <title>Echo</title>
          <actors>
             <actor>Frank</actor>
             <actor>George</actor>
          </actors>
       </movie>
    </movies>
    

    Now, applying the following stylesheet:

    XSLT 1.0

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:key name="movie-by-actor" match="movie" use="actors/actor" />
    
    <xsl:template match="/movies">
        <graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
            <key id="actors" for="edge" attr.name="actors" attr.type="int"/>
            <graph id="movies" edgedefault="undirected">
                <xsl:for-each select="movie">
                    <xsl:variable name="source" select="." />
                    <node id="{title}"/>
                        <xsl:for-each select="key('movie-by-actor', actors/actor)[not(title=$source/title)]">
                            <edge source="{$source/title}" target="{title}">
                                <data key="actors">
                                    <xsl:value-of select="count(actors/actor[.=$source/actors/actor])"/>
                                </data>
                            </edge>
                        </xsl:for-each>
                </xsl:for-each>
            </graph>
        </graphml>
    </xsl:template>
    
    </xsl:stylesheet>
    

    will produce the following result:

    <?xml version="1.0" encoding="UTF-8"?>
    <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
       <key id="actors" for="edge" attr.name="actors" attr.type="int"/>
       <graph id="movies" edgedefault="undirected">
          <node id="Alpha"/>
          <edge source="Alpha" target="Bravo">
             <data key="actors">2</data>
          </edge>
          <edge source="Alpha" target="Charlie">
             <data key="actors">1</data>
          </edge>
          <edge source="Alpha" target="Delta">
             <data key="actors">1</data>
          </edge>
          <node id="Bravo"/>
          <edge source="Bravo" target="Alpha">
             <data key="actors">2</data>
          </edge>
          <edge source="Bravo" target="Charlie">
             <data key="actors">2</data>
          </edge>
          <node id="Charlie"/>
          <edge source="Charlie" target="Alpha">
             <data key="actors">1</data>
          </edge>
          <edge source="Charlie" target="Bravo">
             <data key="actors">2</data>
          </edge>
          <edge source="Charlie" target="Delta">
             <data key="actors">1</data>
          </edge>
          <node id="Delta"/>
          <edge source="Delta" target="Alpha">
             <data key="actors">1</data>
          </edge>
          <edge source="Delta" target="Charlie">
             <data key="actors">1</data>
          </edge>
          <node id="Echo"/>
       </graph>
    </graphml>
    

    which could well be the result you're looking for (I couldn't find a GraphML online viewer, so I cannot be sure).

    However, in the above graph each edge appears twice - once for each direction. If that's a problem, you could eliminate it by doing this instead:

    XSLT 1.0

    <xsl:template match="/movies">
        <graphml xmlns="http://graphml.graphdrawing.org/xmlns"  
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
            <key id="actors" for="edge" attr.name="actors" attr.type="int"/>
            <graph id="movies" edgedefault="undirected">
                <xsl:for-each select="movie">
                    <xsl:variable name="source" select="." />
                    <node id="{title}"/>
                        <xsl:for-each select="following-sibling::movie[actors/actor=$source/actors/actor]">
                            <edge source="{$source/title}" target="{title}">
                                <data key="actors">
                                    <xsl:value-of select="count(actors/actor[.=$source/actors/actor])"/>
                                </data>
                            </edge>
                        </xsl:for-each>
                </xsl:for-each>
            </graph>
        </graphml>
    </xsl:template>
    
    </xsl:stylesheet>
    

    and obtaining the result of:

    <?xml version="1.0" encoding="UTF-8"?>
    <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
       <key id="actors" for="edge" attr.name="actors" attr.type="int"/>
       <graph id="movies" edgedefault="undirected">
          <node id="Alpha"/>
          <edge source="Alpha" target="Bravo">
             <data key="actors">2</data>
          </edge>
          <edge source="Alpha" target="Charlie">
             <data key="actors">1</data>
          </edge>
          <edge source="Alpha" target="Delta">
             <data key="actors">1</data>
          </edge>
          <node id="Bravo"/>
          <edge source="Bravo" target="Charlie">
             <data key="actors">2</data>
          </edge>
          <node id="Charlie"/>
          <edge source="Charlie" target="Delta">
             <data key="actors">1</data>
          </edge>
          <node id="Delta"/>
          <node id="Echo"/>
       </graph>
    </graphml>