python pandas dataframe similarity scanpy

Please tell me how to fill an empty matrix with elements using for loop

I am a beginner who is studying bioinformatics with scanpy these days. I am trying to improve so any help is very welcome, thanks lot!

##This lists contains gene names.
Angio=['ADAM17','AXIN1','AXIN2','CCND2','DKK1','DKK4'] 
Hypoxia=['ADAM17','AXIN1','DLL1','FZD8','FZD1'] 
Infla=['DLL1','FZD8','CCND2','DKK1','ADAM17','JAG2','JAG1'] 
Glycolysis=['MYC','NKD1','PPARD','JAG2','JAG1'] 
Oxophos=['SKP2','TCF7','NUMB']
P53=['NUMB','FZD8','CCND2','AXIN2','KAT2A'] 


df = pd.DataFrame(columns=['Angio', 'Hypoxia', 'Infla', 
                           'Glycolysis', 'Oxophos', 'P53'],
                  index=['Angio', 'Hypoxia', 'Infla', 
                           'Glycolysis', 'Oxophos', 'P53'])


print(df)
           Angio  Hypoxia   Infla   Glycolysis  Oxophos  P53
Angio       NaN     NaN      NaN        NaN       NaN    NaN
Hypoxia     NaN     NaN      NaN        NaN       NaN    NaN
Infla       NaN     NaN      NaN        NaN       NaN    NaN
Glyco       NaN     NaN      NaN        NaN       NaN    NaN
Oxophos     NaN     NaN      NaN        NaN       NaN    NaN
P53         NaN     NaN      NaN        NaN       NaN    NaN


#The function below is to obtain the jaccard similarity score.
#Input is a list of the six above.
def jaccard(list1, list2):
    intersection = len(list(set(list1).intersection(list2)))
    union = (len(list1) + len(list2)) - intersection
    return float(intersection) / union

The six lists contain gene names.

And these lists were named by the names of rows and columns of 'df'.

Obtain the value by using the name of the row and column of 'df' as input in the jaccard function. (Because the previous 6 list names are the names of rows and columns)

At this point, I want to use 'for loop' to replace the NaN of 'df' with the value obtained from the jaccard.

I keep trying to solve this problem, but it doesn't work out. I just don't know what to do. So I am kind of lost, here... Please help me. Thank you.

Solution

If you can convert your lists to dictionary, I suggest following solution:

import pandas as pd


##This dict contains gene names lists. 
genes_dict = {
    'Angio':['ADAM17','AXIN1','AXIN2','CCND2','DKK1','DKK4'],
    'Hypoxia':['ADAM17','AXIN1','DLL1','FZD8','FZD1'],
    'Infla':['DLL1','FZD8','CCND2','DKK1','ADAM17','JAG2','JAG1'],
    'Glycolysis':['MYC','NKD1','PPARD','JAG2','JAG1'],
    'Oxophos':['SKP2','TCF7','NUMB'],
    "P53":['NUMB','FZD8','CCND2','AXIN2','KAT2A'],
}


#The function below is to obtain the jaccard similarity score.
#Input is a list of the six above.
def jaccard(list1, list2):
    intersection = len(list(set(list1).intersection(list2)))
    union = (len(list1) + len(list2)) - intersection
    return float(intersection) / union


names_list = list(genes_dict.keys())


res = {}
for i in range(len(names_list)):
    res[names_list[i]] = {}
    for j in range(len(names_list)):
        res[names_list[i]][names_list[j]] = jaccard(genes_dict[names_list[i]],genes_dict[names_list[j]])
        
        
df = pd.DataFrame(res)