Search code examples
pythonrpubmed

How can I retrive the gene names from the abstracts


abstract.txt = "The insulin-like growth factor 1 (IGF1) signaling pathway has emerged as a major 

regulator of the aging process, from rodents to humans. However, given the pleiotropic actions of IGF1, its role in the aging brain remains complex and controversial. While IGF1 is clearly essential for normal development of the central nervous system, conflicting evidence has emerged from preclinical and human studies regarding its relationship to cognitive function, as well as cerebrovascular and neurodegenerative disorders. This review delves into the current state of the evidence examining the role of IGF1 in the aging brain, encompassing preclinical and clinical studies. A broad examination of the data indicates that IGF1 may indeed play opposing roles in the aging brain, depending on the underlying pathology and context. Some evidence suggests that in the setting of neurodegenerative diseases that manifest with abnormal protein deposition in the brain, such as Alzheimer's disease, reducing IGF1 signaling may serve a protective role by slowing disease progression and augmenting clearance of pathologic proteins to maintain cellular homeostasis. In contrast, inducing IGF1 deficiency has also been implicated in dysregulated function of cognition and the neurovascular system, suggesting that some IGF1 signaling may be necessary for normal brain function. Furthermore, states of acute neuronal injury, which necessitate growth, repair and survival signals to persevere, typically demonstrate salutary effects of IGF1 in that context. Appreciating the dual, at times opposing 'Dr Jekyll' and 'Mr Hyde' characteristics of IGF1 in the aging brain, will bring us closer to understanding its impact and devising more targeted IGF1-related interventions. "

This is my text file abtract of pubmed. this abstract contains some gene names. how can I retrive gene names


Solution

  • This should work, although it will be case sensitive.

    First, you need to rename the abstract variable (it can't end in .txt and be considered a string):

    abstract_text =  "The insulin-like growth factor 1 (IGF1) signaling pathway has emerged..."
    

    Then you NEED a list for all the genes you want to search for:

    genes = ["IGF1"]
    

    Then you can search for all the genes.

    from collections import Counter
    import pandas
    
    genes_dictionary = {}
    
    for i in genes:
       if i in abstract_text:
          genes_dictionary[i] = 1
    

    Convert the dictionary to a table and print it.

    table = pandas.Series(genes_dictionary, name = 'Count')
    table.index.name = 'Gene'
    table.reset_index()
    

    Output: Gene Count 0 IGF1 1