I am a beginner who is studying bioinformatics with scanpy these days. I am trying to improve so any help is very welcome, thanks lot!
##This lists contains gene names.
Angio=['ADAM17','AXIN1','AXIN2','CCND2','DKK1','DKK4']
Hypoxia=['ADAM17','AXIN1','DLL1','FZD8','FZD1']
Infla=['DLL1','FZD8','CCND2','DKK1','ADAM17','JAG2','JAG1']
Glycolysis=['MYC','NKD1','PPARD','JAG2','JAG1']
Oxophos=['SKP2','TCF7','NUMB']
P53=['NUMB','FZD8','CCND2','AXIN2','KAT2A']
df = pd.DataFrame(columns=['Angio', 'Hypoxia', 'Infla',
'Glycolysis', 'Oxophos', 'P53'],
index=['Angio', 'Hypoxia', 'Infla',
'Glycolysis', 'Oxophos', 'P53'])
print(df)
Angio Hypoxia Infla Glycolysis Oxophos P53
Angio NaN NaN NaN NaN NaN NaN
Hypoxia NaN NaN NaN NaN NaN NaN
Infla NaN NaN NaN NaN NaN NaN
Glyco NaN NaN NaN NaN NaN NaN
Oxophos NaN NaN NaN NaN NaN NaN
P53 NaN NaN NaN NaN NaN NaN
#The function below is to obtain the jaccard similarity score.
#Input is a list of the six above.
def jaccard(list1, list2):
intersection = len(list(set(list1).intersection(list2)))
union = (len(list1) + len(list2)) - intersection
return float(intersection) / union
The six lists contain gene names.
And these lists were named by the names of rows and columns of 'df'.
Obtain the value by using the name of the row and column of 'df' as input in the jaccard function. (Because the previous 6 list names are the names of rows and columns)
At this point, I want to use 'for loop' to replace the NaN of 'df' with the value obtained from the jaccard.
I keep trying to solve this problem, but it doesn't work out. I just don't know what to do. So I am kind of lost, here... Please help me. Thank you.
If you can convert your lists to dictionary, I suggest following solution:
import pandas as pd
##This dict contains gene names lists.
genes_dict = {
'Angio':['ADAM17','AXIN1','AXIN2','CCND2','DKK1','DKK4'],
'Hypoxia':['ADAM17','AXIN1','DLL1','FZD8','FZD1'],
'Infla':['DLL1','FZD8','CCND2','DKK1','ADAM17','JAG2','JAG1'],
'Glycolysis':['MYC','NKD1','PPARD','JAG2','JAG1'],
'Oxophos':['SKP2','TCF7','NUMB'],
"P53":['NUMB','FZD8','CCND2','AXIN2','KAT2A'],
}
#The function below is to obtain the jaccard similarity score.
#Input is a list of the six above.
def jaccard(list1, list2):
intersection = len(list(set(list1).intersection(list2)))
union = (len(list1) + len(list2)) - intersection
return float(intersection) / union
names_list = list(genes_dict.keys())
res = {}
for i in range(len(names_list)):
res[names_list[i]] = {}
for j in range(len(names_list)):
res[names_list[i]][names_list[j]] = jaccard(genes_dict[names_list[i]],genes_dict[names_list[j]])
df = pd.DataFrame(res)