Search code examples
plotrandomseabornsample

Sample size for Seaborn regplot y_test and predictions


I'm relatively new to Python. I'm trying to plot the y_test and predictions for my regression model in Seaborn regplot, but it results in overplotting. I have tried to sample from my df (credit), but the sampling isn't working. Here is my code:

# modeling
algo = XGBRegressor(n_estimators=50, max_depth=5)
model = algo.fit(X_train, y_train)

# predictions
preds = model.predict(X_test)

# sampling
data_sample = credit.sample(100)

# plotting results
sns.set_style('ticks')
sns.regplot(y_test, preds, data=data_sample, fit_reg=True, scatter_kws={'color': 'darkred', 'alpha': 0.3, 's': 100})

Overplotted regplot

Any ideas on how to call a sample for the y_test and preds? Ty


Solution

  • You are using the object y_test when you don't quote it inside sns.regplot, and you need to subset the data frame containing both variables, for example:

    import xgboost as xgb
    from sklearn.datasets import load_boston
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import mean_squared_error
    import matplotlib.pyplot as plt
    import seaborn as sns 
    import numpy as np
    
    boston = load_boston() 
    X_train, X_test, y_train, y_test=train_test_split(boston.data, boston.target, test_size=0.15)
    
    # modeling
    algo = xgb.XGBRegressor(n_estimators=50, max_depth=5)
    model = algo.fit(X_train, y_train)
    

    I create a data.frame that contains all the test and predictions:

    preds = model.predict(X_test)
    plotDa = pd.DataFrame({'y_test':y_test,'preds':preds})
    
    sns.set_style('ticks')
    sns.regplot(x='y_test',y='preds', data=plotDa.sample(10), fit_reg=True, scatter_kws={'color': 'darkred', 'alpha': 0.3, 's': 100})
    

    enter image description here

    Or you can do create an index, and use it to plot:

    subsample = np.random.choice(len(preds),10)
    sns.regplot(y_test[subsample],preds[subsample], fit_reg=True)