Search code examples
pythonpandasfilteringisin

Filtering a dataframe with another dataframe


I got two pandas dataframes. One holds the nodes, the other one holds the edges. As a simple fact: all edges should connect to some node.

edges
11                 ["INET_N_752", "INET_N_1730"]
253     ["SEQ_5753__L_LMGN", "SEQ_5369__S_LMGN"]
254         ["N_211_L_LMGN", "SEQ_5753__L_LMGN"]
277            ["SEQ_5753__L_LMGN", "SEQ_867_p"]
278                   ["SEQ_867_p", "SEQ_871_p"]
279            ["SEQ_871_p", "SEQ_5789__L_LMGN"]

Above is the edges df. The values are lists containing two strings.

Below is the nodes df. The values are also lists, however this time they contain only one string object.

nodes
15            ["INET_N_752"]
16           ["INET_N_1730"]
196     ["SEQ_5753__L_LMGN"]
197     ["SEQ_5369__S_LMGN"]
198         ["N_211_L_LMGN"]
222            ["SEQ_867_p"]

I would like to filter edges with nodes.

So if two elements of the list of edges appears in one of the elements in nodes, then that index should be selected.

Example: edges[11] = ['INET_N_752', 'INET_N_1730'], so there should be ['INET_N_752'] and ['INET_N_1730'] in nodes df.

How can I do this?

This works

edges[(edges.apply(lambda x: x[0]).isin(nodes.apply(lambda x: x[0])) &
       edges.apply(lambda x: x[1]).isin(nodes.apply(lambda x: x[0])))]

Solution

  • Try the following:

    edges = pd.DataFrame(edges.to_list(), columns=['node1','node2'])
    nodes = nodes.applymap(lambda n: n[0])
    edges[(edges.node1.isin(nodes)) & (edges.node2.isin(nodes)]