Search code examples
python-3.xxmlgraphnetworkxgraphml

How to specify the <key> id when writing GraphML from NetworkX?


When creating a graph in NetworkX and exporting to GraphML, the keys automatically get turned into "d%i". For example:

import networkx as nx

G = nx.DiGraph()   # or DiGraph, MultiGraph, MultiDiGraph, etc

G.add_node(1, labelV="person", age=29, name="marko")
G.add_node(2, labelV="person", age=27, name="vadas")
G.add_node(3, labelV="software", name="lop", lang="java")
G.add_node(4, labelV="person", age=32, name="josh")
G.add_node(5, labelV="software", name="ripple", lang="java")
G.add_node(6, labelV="person", age=35, name="peter")

G.add_edge(1, 2, labelE="knows", weight=1.0)
G.add_edge(1, 4, labelE="knows", weight=1.0)
G.add_edge(1, 3, labelE="created", weight=0.4)
G.add_edge(4, 5, labelE="created", weight=1.0)
G.add_edge(4, 3, labelE="created", weight=0.4)
G.add_edge(6, 3, labelE="created", weight=0.2)

nx.write_graphml(G, "networkx_tinkergraph_modern.xml")

makes something like

<?xml version='1.0' encoding='utf-8'?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<key attr.name="weight" attr.type="double" for="edge" id="d5"/>
<key attr.name="labelE" attr.type="string" for="edge" id="d4"/>
<key attr.name="lang" attr.type="string" for="node" id="d3"/>
<key attr.name="name" attr.type="string" for="node" id="d2"/>
<key attr.name="age" attr.type="long" for="node" id="d1"/>
<key attr.name="labelV" attr.type="string" for="node" id="d0"/>
<graph edgedefault="directed"><node id="1">
  <data key="d0">person</data>
  <data key="d1">29</data>
  <data key="d2">marko</data>
</node>
<node id="2">
  <data key="d0">person</data>
  <data key="d1">27</data>
  <data key="d2">vadas</data>
</node>
<node id="3">

...

I'm looking to make the keys keep the same name as the as attr.name as specified in the <key> tag:

  <key id="labelV" for="node" attr.name="labelV" attr.type="string"/>
  <key id="name" for="node" attr.name="name" attr.type="string"/>
  <key id="lang" for="node" attr.name="lang" attr.type="string"/>
  <key id="age" for="node" attr.name="age" attr.type="int"/>
  <key id="labelE" for="edge" attr.name="labelE" attr.type="string"/>
  <key id="weight" for="edge" attr.name="weight" attr.type="double"/>
  <graph id="G" edgedefault="directed">
    <node id="1">
      <data key="labelV">person</data>
      <data key="name">marko</data>
      <data key="age">29</data>
    </node>
    <node id="2">
      <data key="labelV">person</data>
      <data key="name">vadas</data>
      <data key="age">27</data>
    </node>
    <node id="3">
      <data key="labelV">software</data>
      <data key="name">lop</data>
      <data key="lang">java</data>
    </node>

...

Someone asked the same question on the Google forums but was told to change how GraphML is read by a Gremlin server, not how it is written using NetworkX.

Is there a way to make nx.write_graphml() or nx.generate_graphml() create files using labelV, age, ... instead of d0, d1, ... for the id= in <key> tags?


Solution

  • In NetworkX 2.5 this is now possible with named_key_ids:

    nx.write_graphml(G, "networkx_tinkergraph_modern.xml", named_key_ids=True)