I am running an enrichment analysis with gseapy enrichr on a list of genes. I am using the following code:
enr_res = gseapy.enrichr(gene_list = glist[:5000],
organism = 'Mouse',
gene_sets = ['GO_Biological_Process_2021'],
description = 'pathway',
#cutoff = 0.5
)
The result looks like this:
enr_res.results.head(10)
The question I have is, how do I get the full set of Genes (very right column in the picture) used for the individual pathways?
If I try the following code, it will just give me the already displayed genes. I added some correction to have a list that I then could further use for the analysis.
x = 'fatty acid beta-oxidation (GO:0006635)'
g_list = enr_res.results[enr_res.results.Term == x]['Genes'].to_string()
deliminator = ';'
g_list = [section + deliminator for section in g_list.split(deliminator) if section]
g_list = [s.replace(';', '') for s in g_list]
g_list = [s.replace(' ', '') for s in g_list]
g_list = [s.replace('.', '') for s in g_list]
first_gene = g_list[0:1]
first_gene = [sub[1 : ] for sub in first_gene]
g_list[0:1] = first_gene
for i in range(len(g_list)):
g_list[i] = g_list[i].lower()
for i in range(len(g_list)):
g_list[i] = g_list[i].capitalize()
g_list
I think my approach might be wrong to get all the Genes and I just get the displayed genes. Does somebody has an idea, how it is possible to get all genes?
pd.set_option('display.max_colwidth', 3000)
This increases the number of displayed characters and somehow this solves the problem for me. :)