Search code examples
pythonpandasindexingshap

How to find the row number from a character index in python?


I have a genetic dataset where the index of a row is the name of the gene. I am looking to also find the row number of any given gene so I can look at genes individually after they've gone through a machine learning model prediction - to interpret the gene's prediction in shap. How I code for the shap plot currently needs a row number to pull out the specific gene.

My data looks like this:

Index   Feature1  Feature2   ... FeatureN
Gene1     1           0.2          10
Gene2     1           0.1          7
Gene3     0           0.3          10

For example if I want to pull out and view model prediction of Gene3 I do this:

import shap
shap.initjs()

xgbr = xgboost.XGBRegressor()

def shap_plot(j):
    explainerModel = shap.TreeExplainer(xgbr)
    shap_values_Model = explainerModel.shap_values(X_train)
    p = shap.force_plot(explainerModel.expected_value, shap_values_Model[j], X_train.iloc[[j]],feature_names=df.columns)
    return(p)

shap_plot(3)

Doing shap_plot(3) is a problem for me as I do not actually know if the gene I want is in row 3 in the shuffled training or testing data.

Is there a way to pull out the row number from a known Gene index? Or potentially re-code my shap plot so it does accept my string indices? I have a biology background so any guidance would be appreciated.


Solution

  • Try the following. df is your dataframe and result will give you the row number (first row will result 1, etc) for a given gene

    list(df.index).index('Gene3')+1
    
    #result
    
    3