Search code examples
pythonbioinformaticsrna-seq

save as dataframe python


I am really new in python, so I am doing a consultd and I want that the results be save like dataframe instead of be just print in the terminal. Here is my code:

service = Service("https://www.mousemine.org/mousemine/service")
query = service.new_query("Gene")
query.add_view(
    "primaryIdentifier", "symbol", "organism.name",
    "homologues.homologue.primaryIdentifier", "homologues.homologue.symbol",
    "homologues.homologue.organism.name", "homologues.type",
    "homologues.dataSets.name"
)
query.add_constraint("homologues.type", "NONE OF", ["horizontal gene transfer", "least diverged horizontal gene transfer"], code = "B")
query.add_constraint("Gene", "LOOKUP", "ENSMUSG00000026981,ENSMUSG00000068039,ENSMUSG00000035007,ENSMUSG00000022972,", "M. musculus", code = "A")
query.add_constraint("homologues.homologue.organism.name", "=", "Homo sapiens", code = "C")
query.add_constraint("homologues.dataSets.name", "=", "Mouse/Human Orthologies from MGI", code = "D")

   for row in query.rows():
    print(row["primaryIdentifier"], row["symbol"], row["organism.name"], \
        row["homologues.homologue.primaryIdentifier"], 
        row["homologues.homologue.symbol"], \
        row["homologues.homologue.organism.name"], row["homologues.type"], \
        row["homologues.dataSets.name"])

And this is the result that I get it

MGI:1915251 Cfap298 Mus musculus 56683 CFAP298 Homo sapiens orthologue Mouse/Human Orthologies from MGI MGI:2144506 Rundc1 Mus musculus 146923 RUNDC1 Homo sapiens orthologue Mouse/Human Orthologies from MGI MGI:96547 Il1rn Mus musculus 3557 IL1RN Homo sapiens orthologue Mouse/Human Orthologies from MGI MGI:98535 Tcp1 Mus musculus 6950 TCP1 Homo sapiens orthologue Mouse/Human Orthologies from MGI

And it is perfectly ok, but I need it in a dataframe. And if I can the consult using a table with all the ID and not have to writing one by one (because there are 14000) that will be amazing.


Solution

  • Your example is not reproducible because we can't construct the query but I suppose, this should work:

    import pandas as pd
    
    df = pd.DataFrame(list(query.rows()))
    
    # OR
    
    df = pd.DataFrame(query.rows(), columns=query.views)
    

    Output:

    >>> df
      Gene.briefDescription                                   Gene.description   Gene.id  ...  Gene.homologues.homologue.organism.name Gene.homologues.type     Gene.homologues.dataSets.name
    0                  None  FUNCTION: <B>Automated description from the Al...  23666503  ...                             Homo sapiens           orthologue  Mouse/Human Orthologies from MGI
    1                  None  FUNCTION: <B>Automated description from the Al...  23341647  ...                             Homo sapiens           orthologue  Mouse/Human Orthologies from MGI
    2                  None  FUNCTION: <B>Automated description from the Al...  23862751  ...                             Homo sapiens           orthologue  Mouse/Human Orthologies from MGI
    3                  None  FUNCTION: <B>Automated description from the Al...  23677242  ...                             Homo sapiens           orthologue  Mouse/Human Orthologies from MGI
    
    [4 rows x 21 columns]