Search code examples

Trying to predict probability score using test data

I'm currently trying to test features and impacts to probability score on regression model we've built. I'm trying to test impacts of age on proba score to see if we need to retrain our model. I'm using parameters from our model as Param_Collection and using test data for age and sex and cc_list. I thought the code would work but for the life of me I can't figure out what's causing y to be null given the if statement below it should still show me the score if its not > threshold.

import numpy as np

# Building test data where member has PULL
x_test = {
    "cc_list": ["PULL"],
    "age": 38,
    "sex": "M"

# Defining parameters from training data from previous model
    "PULL": {
        "auc": 0.8202432743081695,
        "coef": [-0.01853237366699478, 0.14359336438414397, 3.0070029131017155, 1.4999028794882714, 0.2499927123452168, 0.00869006612608888, -0.17741710091314503],
        "features_sltd": ["CARM", "GIL", "PULL", "PULM", "SKCVL", "age", "sex"],
        "intercept": -3.066213895858403,
        "model_name": "l1-reg",
        "regularization_param": 100000.0,
        "threshold": 0.5277152026373001

# Trying to predict the probability score here
y = {}
coll_name = "PULL"
param_coll = PARAM_COLLECTION[coll_name]

for cc in x_test["cc_list"]:
    if cc not in param_coll:
    param = param_coll[cc]
    if param["model_name"] == "none":
    features_sltd = param["features_sltd"]
    features_efft = []
    x_vec = np.zeros(len(features_sltd))
    for i, f in enumerate(features_sltd):
        if f in x_test["cc_list"]:
            x_vec[i] = 1.0
            features_efft.append((f, param["coef"][i]))
    features_efft = sorted(features_efft, key=lambda x: -x[1])
    features_efft = [f[0] for f in features_efft if f[1] > 0.1]   
    if len(features_efft)==0:
    x_vec[features_sltd.index("age")] = x_test["age"] 
    x_vec[features_sltd.index("sex")] = int(x_test["sex"]=="M")
    beta =["coef"]), x_vec) + param["intercept"]
    proba = 1.0/(1.0 + np.exp(-beta))
    if proba > param["threshold"]:
        y[cc] = {"score": np.clip(proba, 0.0, 1.0), "features": features_efft}
        y[cc] = {"score": 0.0, "features": []}

# Print the output


  • The only cc value you have is PULL which is never in your param_coll, so the loop never runs past the first if statement.