Search code examples
pythonnetworkxgraph-theorychemistrycheminformatics

NetworkX for chemistry: how to check if a smaller molecular graph A is a valid subgraph of a larger molecular graph B?


I am attempting to use graph theory, via Python, in order to verify whether molecular fragments are valid substructures of larger molecules. Let us see an example for the neurotransmitter serotonin:

from pysmiles import read_smiles
import networkx as nx
import matplotlib.pyplot as plt

# Descriptor of serotonin, via SMILES notation
serotonin = "C1=CC2=C(C=C1O)C(=CN2)CCN"
# Uses pysmiles to convert the SMILES into a Graph object
serotonin_graph = read_smiles(serotonin)

# Function for plotting Graph objects of molecules
def plot_graph(graph):
    # Small dictionary of how elements should be coloured
    colors_dict = {"C": "grey", "N": "green", "O": "red"}
    # Finds elements corresponding to nodes in the graph
    elements = nx.get_node_attributes(graph, name = "element")
    # Defines a list of colours for the nodes, accordingly
    element_colors = [colors_dict[elements[e]] for e in elements]
    # Displays the molecular graph
    nx.draw(graph, node_color=element_colors, with_labels=True, labels=nx.get_node_attributes(graph, name = "element"))
    plt.show()

# Runs the plotting function for serotonin
plot_graph(serotonin_graph)

# First three SMILES are valid fragments of serotonin, whereas the last is not
fragments = ['Oc:1:c:c:[c]:[c]:c1', 'NC[CH2]', 'CNCC', 'NC[CH2]:O:O:C']

# Converts fragments into Graph objects and displays them
for frag in fragments:
    fragment_graph = read_smiles(frag)
    plot_graph(fragment_graph)

In this case, we are dealing with serotonin, which the code above may plot into a molecular graph as follows (note that the notation we are using automatically ignores hydrogen atoms):

Generated molecular graph for serotonin

The 4 molecular fragments considered in the code may also be plotted into graphs, as follows:

Graphs of molecular fragments

As may be easily deduced by eye, the first three are valid subgraphs of serotonin, whereas the last (bottom right) is not - for example, the last fragment contains two oxygen atoms, while the serotonin graph clearly does not.

I hence wish to obtain a function that may be able to do the following:

>>> subgraph_checker(serotonin_graph, valid_substructure)
True
>>> subgraph_checker(serotonin_graph, invalid_substructure)
False

Hence this function would return True for all of the fragment graphs above, except for the last bottom right one.

There have been similar questions asked here about using NetworkX for subgraph searches, but so far none have sufficed - e.g. some solutions are specific to directed graphs, or recommend non-working solutions.

Here is one solution which I have tried, but which didn't work, using NetworkX's function for searching for isomorphic subgraphs:

GM = nx.algorithms.isomorphism.GraphMatcher(serotonin_graph, fragment_graph)
print(GM.subgraph_is_isomorphic())

When applying the above to all of the fragments, all return True, which of course should not be the case for the last fragment.

I have also tried alternative solutions, such as using RDKit functions in order to search directly for substructures within serotonin, but these are not working so well and it appears to be a bug.

Any help or insight into solving this problem via NetworkX, using graph theory, is very much appreciated!


Solution

  • You're checking isomorphism by only looking at the structure of graphs. Instead, in your case you need also to check the content of each of your nodes. In other words, you should define when two nodes (without edges) are equal. I think in your case it's sufficient for those two nodes to share the same "element".

    for frag in fragments:
        fragment_graph = read_smiles(frag)
        gm = nx.isomorphism.GraphMatcher(
            serotonin_graph,
            fragment_graph,
            node_match=lambda n1, n2: n1.get('element') == n2.get('element'),
        )
        print(gm.subgraph_is_isomorphic())
    

    Here's the output:

    True
    True
    True
    False
    

    node_match is a function that returns True iff node n1 in the first graph and n2 in the second graph should be considered equal during the isomorphism test. The function will be called like: node_match(G1.nodes[n1], G2.nodes[n2]).

    That is, the function will receive the node attribute dictionaries of the nodes under consideration. If None, then no attributes are considered when testing for an isomorphism.

    More info can be found here. Notice that you can also define an edge_match function.