Search code examples
pythonpandasnumpygraphviz

How can I put the nodes related as spouse next to each other in a family tree generator program?


I have written a code using pandas and graphviz to generate a family tree from a csv file.

ID S First name Last name DoB DoD FatherID MotherID SpouseID Place of birth Job
JoS1 M John S 1111 2222 MaS1 India Job-1
MaS1 F Mary S 1112 JoS1 India Job-2
JaS M Jacob S 1113 JoS1 MaS1 KeS India Job-3
JoS2 M Joe S 1114 2225 JoS1 MaS1 AnS India Job-4
MaS2 F Macy D 1115 JoS1 MaS1 AnD India Job-5
KeS F Keysha S 1116 JaS India Job-6
AnD M Andy D 1117 MaS2 India Job-7
AnS F Anna S 1118 JoS2 India Job-8
MiS M Mike S 1119 JaS KeS India
SaS M Sam S 1120 JaS KeS India
MaS3 F Matt S 2345 JoS2 AnS India

The code:

from graphviz import Digraph
import pandas as pd
import numpy as np

rawdf = pd.read_csv('/content/drive/MyDrive/ftdata.csv', keep_default_na=False)  ## Change file path
el1 = rawdf[['ID','MotherID','SpouseID']]
el2 = rawdf[['ID','FatherID','SpouseID']]
el1.columns = ['Child', 'ParentID','SpouseID']
el2.columns = el1.columns
el = pd.concat([el1, el2])
el.replace('', np.nan, regex=True, inplace = True)
t = pd.DataFrame({'tmp':['no_entry'+str(i) for i in range(el.shape[0])]})
el['ParentID'].fillna(t['tmp'], inplace=True)
el['SpouseID'].fillna(t['tmp'], inplace=True)
df = el.merge(rawdf, left_index=True, right_index=True, how='left')
df['name'] = df[df.columns[4:6]].apply(lambda x: ' '.join(x.dropna().astype(str)),axis=1)
df = df.drop(['Child','FatherID', 'ID', 'First name', 'Last name'], axis=1)
df = df[['ID', 'name', 'S', 'DoB', 'DoD', 'Place of birth', 'Job', 'ParentID']]
#df

f = Digraph('neato', format='jpg', encoding='utf8', filename='testfile', node_attr={'style': 'filled'},  graph_attr={"concentrate": "true", "splines":"ortho"})
f.attr('node', shape='box')
for index, row in df.iterrows():
    f.node(row['ID'],
           label=
             str(row['name'])
              + '\n' +
             str(row['Job'])
             + '\n'+
             str(row['DoB'])
             + '\n' +
             str(row['Place of birth'])
             + '\n†' +
             str(row['DoD']),
           _attributes={'color':'lightpink' if row['S']=='F' else 'lightblue'if row['S']=='M' else 'lightgray'})
for index, row in df.iterrows():
    f.edge(str(row["ParentID"]), str(row["ID"]), label='')
f.view()

The result:

Image of family tree but the couples with spouse relation are not grouped.

Now, I want to put the spouses next to each other using clusters or groups but can't find a way to do it. So, I need help in figuring out how I could fix this issue.


Solution

  • I found the solution and was able to cluster the couples together using the code given below:

    # Check if the node has a spouse
          if str(row['SpouseID']) != '':
              spouse_id = str(row['SpouseID'])
    
              # Check if the spouse cluster exists, if not create one
              if spouse_id not in spouse_clusters:
                  spouse_clusters[spouse_id] = Digraph('cluster_' + spouse_id)
                  spouse_clusters[spouse_id].attr(label='Couple', color='lightgreen', style='filled')
    
              # Add the node to the spouse cluster
              spouse_clusters[spouse_id].node(node_id, label=node_label, color=node_color)
          else:
              # Add nodes without spouses directly to the main Digraph
              f.node(node_id, label=node_label, color=node_color)
    
      # Add nodes and clusters to the main Digraph
      for cluster_id, cluster in spouse_clusters.items():
          f.subgraph(cluster)
    

    This code was added in the part in which the nodes were created using the for loop after the creation of the nodes. Also a dictionary 'spouse_clusters' was created before the for loop. This adds the nodes which are spouses of each other as the values of the dictionary and adds them as a subgraph to the main digraph.

    Note: This does require the values in the SpouseID field of the data to be changed so that both the spouse have the same SpouseID. This was achieved by taking the first 2 letters of the first names of the spouse and using it as SpouseID. Eg. John S - Mary S --> JoMa