Search code examples
stan

STAN - Real parameter given a categorical vector


I am a beginner using stan (and stackoverflow by the way, I have absolutely no idea how you pretty print a dataframe on it, could not find how, sorry).

Let's say I want to make the following model :

y ~ normal(I*S + P*J,sigma)
P ~ normal(dP,1)

(to simplify this example I fix the deviation around P at 1)

where on one hand, I is a n,p matrix predictor and S are the corresponding regression coefficients (size p)

and on the other hand dP and J can only have 3 different values but my dataframe is constructed such as they looks like this (R-project):

dp <- c(0,0,0,0,0,0,0,2,2,2,2,2,2,2,1,1,1,1,1,1)
J <- c(5.2,5.2,....,2.3,2.3,....,7.5,7.5,...)

parameters are S, P and sigma.

I do no want stan to change every components of P, dp represents 3 types of data and I only want three different values of P corresponding to the 3 differents values of dp. However each row of my dataframe contains different values of I.

edit: said in another way: for each row k, I want :

y[k] ~ I[k,1]*S[1]+I[k,2]*S[2]...+ real_value_P * J[k]

How can I achieve that ?

Here is my code:

data {
 int < lower = 1 > NR; // Number of rows
 int < lower = 1 > NC; // Number of columns
 
 matrix [NR,NC] I ;// Predictor I
 vector [NR] dP;
 vector [NR] J ;

 vector [NR] y; // Outcome
}

parameters {
 real < lower = 0 > sigma; // Error SD
 vector [NC] S ;
 vector [NC] P ;     
 }

model {
P ~ normal(dP,1)
y ~ normal(I*S+P*J,sigma) ;
}

I am not sure I have been really clear, stats are still a tough subject to me either and my model is a bit more complicated than presented.

Thanks


Solution

  • The "trick" seems to indicate in a vector ("indices" here) which value of P to take (1,1,1,1,1,...2,2,2,2,2,...,3,3,3,3,3) and then to loop over it in the parameters to assign the correct value :

    transformed parameters {
     vector [NR] JP ; //J*P
     for (k in 1:NR){ 
            JP[k]=True_P[indices[k]]*J[k] ;
     }
    

    Hence the complete code :

    data {
     int < lower = 1 > NR; // Number of rows
     int < lower = 1 > NC; // Number of columns
     
     matrix [NR,NC] I ;// Predictor I
     
     int indices[NR]; // indices
     vector [3] P ; 
     vector [NR] dP ;
     vector [NR] J ; 
    
     vector [NR] y; // Outcome
    }
    
    parameters {
     real < lower = 0 > sigma; // Error SD
     vector < lower = 0 > [NC] S ;     // regression coefficients for predictors I
     vector [2] True_P ;
    }
    
    transformed parameters {
     vector [NR] JP ; //J*P
     for (k in 1:NR){ 
            JP[k]=True_P[indices[k]]*J[k] ;
     }
    
    }
    
    model {
    
    for (k in 1:3){
    P[k] ~ normal(True_P[k],1) ; 
    }
    
    y ~ normal(I*S+JP,sigma) ;
    }
    
    generated quantities {
    } // The posterior predictive distributiondistribution